HattrickNZ HattrickNZ - 4 months ago 6
Bash Question

awk + doing a group summation using awk with for loop for reading in and outputting

this is my file:

$ cat -v head_datafile_pipe_deleimiter_more_columns.csv
"Rec_Open_Date"|"MSISDN"|"IMEI"|"Data_Volume_Bytes"|"Device_Manufacturer"|"Device_Model"|"Product_Description"|"Data_Volume_MB"|">20MB/30"|">200MB/30"|">2048MB/30"|">5120MB/30"|">10240MB/30"
"2016-07-25"|"537"|"190"|"48195367"|"Samsung Korea"|"Samsung SM-G928I"|"$49.95 Carryover Plan"|"45.9627"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"370"|"780"|"92875513"|"Samsung Korea"|"Samsung SM-J500Y"|"$39.95 Plan"|"88.573"|"1"|"1"|"1"|"0"|"0"
"2016-07-25"|"024"|"640"|"35174867"|"Samsung Korea"|"Samsung GT-I9505"|"$29.95 Plan"|"33.5454"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"091"|"660"|"0"|"LG Electronics Inc."|"LG LG-D802"|"PREPAY PLUS - TRIAL - #16"|"0"|"0"|"0"|"0"|"0"|"0"
"2016-07-25"|"706"|"300"|"33352130"|"Apple Inc"|"Apple iPhone 5S A1530"|"PREPAY PLUS - $0 -"|"31.8071"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"917"|"350"|"85820068"|"Apple Inc"|"Apple iPhone 6S Plus A1687"|"$39.95 Plan"|"81.8444"|"1"|"1"|"1"|"0"|"0"
"2016-07-25"|"027"|"070"|"9718436"|"Samsung Korea"|"Samsung SM-G900F"|"$29.95 Plan"|"9.26822"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"173"|"780"|"19152160"|"Apple Inc"|"Apple iPhone 6S A1688"|"$49.95 Plan"|"18.2649"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"685"|"140"|"17443425"|"Apple Inc"|"Apple iPhone 6S Plus A1687"|"$89.95 Plan"|"16.6353"|"1"|"1"|"0"|"0"|"0"


this is the array I create

$ mapfile -t u_vals <<<"$(awk -F'|' 'NR>1{print $7}' head_datafile_pipe_deleimiter_more_columns.csv | sort | uniq)"
$ declare -p u_vals
declare -a u_vals='([0]="\"\$29.95 Plan\"" [1]="\"\$39.95 Plan\"" [2]="\"\$49.95 Carryover Plan\"" [3]="\"\$49.95 Plan\"" [4]="\"\$89.95 Plan\"" [5]="\"PREPAY PLUS - TRIAL - #16\"" [6]="\"PREPAY PLUS - \$0 -\"")'


THis is an example of the output I want, but this only sums columns 8-9, I want to do columns 8-NF. And this does not use a for loop.

$ for el in "${u_vals[@]}"; do awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {gsub(/"/,"",$8); sum+=$8; gsub(/"/,"",$9); sum2+=$9} }; NR>1{count++} END { print "output" FS j FS q count q FS sum FS sum2}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n
1 output|"$29.95 Plan"|"9"|42.8136|2
2 output|"$39.95 Plan"|"9"|170.417|2
3 output|"$49.95 Carryover Plan"|"9"|45.9627|1
4 output|"$49.95 Plan"|"9"|18.2649|1
5 output|"$89.95 Plan"|"9"|16.6353|1
6 output|"PREPAY PLUS - TRIAL - #16"|"9"|0|0
7 output|"PREPAY PLUS - $0 -"|"9"|31.8071|1


But I want to create a for loop for cols 8-NF, which I am trying to do here for col 8-9
{for(i=8;i<=9;i++) gsub(/"/,"",$i); sum[i]+=$i}
, but I am not sure if it can be done

$ for el in "${u_vals[@]}"; do awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {for(i=8;i<=9;i++) gsub(/"/,"",$i); sum[i]+=$i} }; NR>1{count++} END { for(i=8;i<=9;i++) print "output" FS j FS q count q FS sum[i] FS sum[i]}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n


This is my desired output, this just has cols 8 and 9 grouping summed here:

........................|col8 |col9|...

output|"$29.95 Plan"|"9"|42.8136|2|<other column sums here>
output|"$39.95 Plan"|"9"|170.417|2 ...
output|"$49.95 Carryover Plan"|"9"|45.9627|1
output|"$49.95 Plan"|"9"|18.2649|1
output|"$89.95 Plan"|"9"|16.6353|1
output|"PREPAY PLUS - TRIAL - #16"|"9"|0|0
output|"PREPAY PLUS - $0 -"|"9"|31.8071|1|<other column sums here>





EDIT1



trying to understand the difference between
print
printf
this is me playing which helps.

the only difference between the 2 below commands:

print ("output" FS j FS q count q FS sum[8]);


printf ("output" FS j FS q count q FS sum[8]);


$ for el in "${u_vals[@]}"; do awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {for(i=8;i<=NF;i++){gsub(/"/,"",$i); sum[i]+=$i}} }; NR>1{count++} END { print ("output" FS j FS q count q FS sum[8]); for(i=8;i<=NF; i++){ printf("%s",FS sum[i])} printf("\n")}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n
1 output|"$29.95 Plan"|"9"|42.8136
2 |42.8136|2|2|0|0|0
3 output|"$39.95 Plan"|"9"|170.417
4 |170.417|2|2|2|0|0
5 output|"$49.95 Carryover Plan"|"9"|45.9627
6 |45.9627|1|1|0|0|0
7 output|"$49.95 Plan"|"9"|18.2649
8 |18.2649|1|1|0|0|0
9 output|"$89.95 Plan"|"9"|16.6353
10 |16.6353|1|1|0|0|0
11 output|"PREPAY PLUS - TRIAL - #16"|"9"|0
12 |0|0|0|0|0|0
13 output|"PREPAY PLUS - $0 -"|"9"|31.8071
14 |31.8071|1|1|0|0|0
1415137039 [main] -sh 4624 sig_send: error sending signal -66, pipe handle 0x1210, nb 132, packsize 0, Win32 error 0

$ for el in "${u_vals[@]}"; do awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {for(i=8;i<=NF;i++){gsub(/"/,"",$i); sum[i]+=$i}} }; NR>1{count++} END { printf ("output" FS j FS q count q FS sum[8]); for(i=8;i<=NF; i++){ printf("%s",FS sum[i])} printf("\n")}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n
1 output|"$29.95 Plan"|"9"|42.8136|42.8136|2|2|0|0|0
2 output|"$39.95 Plan"|"9"|170.417|170.417|2|2|2|0|0
3 output|"$49.95 Carryover Plan"|"9"|45.9627|45.9627|1|1|0|0|0
4 output|"$49.95 Plan"|"9"|18.2649|18.2649|1|1|0|0|0
5 output|"$89.95 Plan"|"9"|16.6353|16.6353|1|1|0|0|0
6 output|"PREPAY PLUS - TRIAL - #16"|"9"|0|0|0|0|0|0|0
7 output|"PREPAY PLUS - $0 -"|"9"|31.8071|31.8071|1|1|0|0|0

Answer

You can simply iterate over 8 to NF. You also must replace the final print call in the END block with one initial printf to print the fields that you want to output prior to the 8:NF fields, then use a printf call inside the for-loop to print the 8:NF fields, and finally print a newline after the for-loop. We must use printf instead of print for the former two prints to prevent the newline from being printed prematurely.

for el in "${u_vals[@]}"; do

    awk -F\| -v q=\" -v j="$el" '

        NR > 1 {
            if ($7 == j) {
                for (i = 8; i <= NF; ++i) {
                    gsub(/"/,"",$i);
                    sum[i] += $i;
                }
            }
            ++count;
        };

        END {
            printf("%s","output" FS j FS q count q);
            for (i = 8; i <= NF; ++i)
                printf("%s",FS sum[i]);
            printf("\n");
        };

    ' head_datafile_pipe_deleimiter_more_columns.csv;

done| cat -n;
##      1  output|"$29.95 Plan"|"9"|42.8136|2|2|0|0|0
##      2  output|"$39.95 Plan"|"9"|170.417|2|2|2|0|0
##      3  output|"$49.95 Carryover Plan"|"9"|45.9627|1|1|0|0|0
##      4  output|"$49.95 Plan"|"9"|18.2649|1|1|0|0|0
##      5  output|"$89.95 Plan"|"9"|16.6353|1|1|0|0|0
##      6  output|"PREPAY  PLUS - TRIAL - #16"|"9"|0|0|0|0|0|0
##      7  output|"PREPAY PLUS - $0 -"|"9"|31.8071|1|1|0|0|0
Comments