Girish Sonar Girish Sonar - 22 days ago 6
Bash Question

Rollup and de-normalize data using AWK

I have below scenario where we have 4 record and using

awk
I am getting proper data as expected but I am not able to understand how
awk
is working here, what is the meaning of each a[] here. can someone explain below
awk
command in brief? Specifically "If" part.

$ cat temp.dat
abc|v1
abc|v2
def|v1
def|v3

$ awk -F"|" '{if(a[$1]){a[$1]=a[$1]","$2} else { a[$1]=$2}} END {for (i in a) {print i"|"a[i]}}' temp.dat
def|v1,v3
abc|v1,v2

Answer

These type of questions tend to go down as off-topic since not belonging to scope of this site, but let me help you understand it, as you are relatively new to the site.

The breakdown of the command (literally):-

  • -F"|" is to setting the input-field-separator, i.e. to let awk know on the de-limiter to parse words from, in this case being |. awk runs the command(s) on each line of the input-file.
  • In an awk array (a[]), the if-condition checks if there is a value in the array for subscript $1, i.e. when parsing first line it checks if value a[abc] exists. Since it is not likely to exist, the else-clause stores the value of $2 in the array variable (v1) i.e. a[abc]=v1
  • On parsing the next line (abc|v2), since now the value in a[abc] exists the if-clause is executed. a[$1]=a[$1]","$2 literally means, overwrite the value in a[abc] with already existing value(v1), comma (,) and the current value of $2 i.e. a[abc] now has v1,v2
  • The same above two steps happen for the next set of lines which stores a[def]=v1,v3

Now that the array is filled, the END clause of awk is met. By general logic, the statements within this clause is executed once awk has parsed the file in-line and did some modifications with it.

In your case though, END{} statements just the contents of the array are printed.

  • for (i in a) means for each of subscripts in the array i.e. i in (abc,def)
  • print i"|"a[i]} prints the value of i being (abc,def) and the value of the index from the array i.e. a[abc] and a[def] with | in middle.

Read more about awk in-this-tutorial