user2328149 user2328149 - 7 months ago 10
Bash Question

Comparing csv files using field values

I have some csv files with the following format in the same folder:

Name - Value - Number - Key


I want to compare these files pairwise and give them a score in this way. If all the names in the Name column of the first file aren't in the corresponding column of the second file the score will be 0.
Instead, if they are, the score will be given as shown in this example:

File1.csv

NameA, ValueA, NumberA, KeyA

Jack, 10, 9, 12
Alex, 30, 2, 16
Mark, 15, 3, 18


File2.csv

NameB, ValueB, NumberB, KeyB

Jack, 13, 4, 11
Alex, 22, 5, 18
Bill, 67, 6, 20
Mark 18, 8, 26


Score = abs(11 - 10)/9 + abs(18 - 30)/2 + abs(26 - 15)/3

So it will be given by the summation of the abs(KeyB - ValueA)/NumberA scores, where abs is the absolute value of the subtraction.

How can I do this?

Answer

First of all, based on your formula, abs(KeyB - ValueA)/NumberA, you should have

    Score = abs(11 - 10)/9 + abs(18 - 30)/2 + abs(26 - 15)/3 = 9.7778

instead of

    Score = abs(11 - 10)/9 + abs(18 - 30)/2 + abs(26 - 15)/18

the awk command can be

     awk -F, 'function abs(x){return ((x < 0.0) ? -x : x)} 
     BEGIN {while (getline < "file1.csv" ) { f[$1] = $2 ; g[$1] = $3; h[$1] = $4 } }
     { if (g[$1] != 0 ) score+= abs($4 - f[$1])/g[$1]  }  END { print score } ' file2.csv