Malgi Malgi - 4 months ago 10
Linux Question

Subtract a constant number from a column

I have two large files (~10GB) as follows:

file1.csv

name,id,dob,year,age,score
Mike,1,2014-01-01,2016,2,20
Ellen,2, 2012-01-01,2016,4,35
.
.


file2.csv

id,course_name,course_id
1,math,101
1,physics,102
1,chemistry,103
2,math,101
2,physics,102
2,chemistry,103
.
.


I want to subtract 1 from the "id" columns of these files:

file1_updated.csv

name,id,dob,year,age,score
Mike,0,2014-01-01,2016,2,20
Ellen,0, 2012-01-01,2016,4,35


file2_updated.csv

id,course_name,course_id
0,math,101
0,physics,102
0,chemistry,103
1,math,101
1,physics,102
1,chemistry,103


I have tried
awk '{print ($1 - 1) "," $0}' file2.csv
, but did not get the correct result:

-1,id,course_name,course_id
0,1,math,101
0,1,physics,102
0,1,chemistry,103
1,2,math,101
1,2,physics,102
1,2,chemistry,103

Answer

You've added an extra column in your attempt. Instead set your first field $1 to $1-1:

 awk -F"," 'BEGIN{OFS=","} {$1=$1-1;print $0}' file2.csv

That semicolon separates the commands. We set the delimiter to comma (-F",") and the Output Field Seperator to comma BEGIN{OFS=","}. The first command to subtract 1 from the first field executes first, then the print command executes second, so the entire record, $0, will now contain the new $1 value when it's printed.

It might be helpful to only subtract 1 from records that are not your header. So you can add a condition to the first command:

awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1-1} {print $0}' file2.csv

Now we only subtract when the record number (NR) is greater than 1. Then we just print the entire record.