Malgi Malgi - 4 months ago 10
Linux Question

Add a variable to a column in a CSV file

I have a large file (~10GB) and I want to duplicate that file 10 times but each time add a variable to the first column:

for i in (1, 10):
var = (i-1) * 1000
# add var to the first column of the file and save the file as file(i).csv


So far I have tried:

#!/bin/bash
for i in {1..10}
do
t=1
j=$(( $i - t ))
s=1000
person_id=$(( j * add ))
awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1+$person_id} {print $0}' file.csv > file$i.csv
done


but no change in column value.

Answer

Awk variables are different from shell variables.

Replace:

awk -F"," 'BEGIN{OFS=","} NR>1{$1=$1+$person_id} {print $0}' file.csv > file$i.csv

With:

awk -F"," -v id="$person_id" 'BEGIN{OFS=","} NR>1{$1=$1+id} {print $0}' file.csv > "file$i.csv"

This uses the -v option to define an awk variable id whose value is the value of the shell variable person_id.

Because , is not a shell-active character, the code can be simplified. Also, changing the location of the definition of OFS can further shorten the code:

awk -F, -v id="$person_id" 'NR>1{$1+=id} 1' OFS=, file.csv > "file$i.csv"

Lastly, we replaced {print $0} with the cryptic shorthand 1. (This works because awk interprets 1 as a logical condition which it evaluates to true and, since no action was supplied, awk will perform the default action which is to print the line.)

Comments