user2328149 - 1 year ago 42

Bash Question

I have some csv files with the following format in the same folder:

`Name - Value - Number - Key`

I want to compare these files pairwise and give them a score in this way. If all the names in the Name column of the first file aren't in the corresponding column of the second file the score will be 0.

Instead, if they are, the score will be given as shown in this example:

File1.csv

`NameA, ValueA, NumberA, KeyA`

Jack, 10, 9, 12

Alex, 30, 2, 16

Mark, 15, 3, 18

File2.csv

`NameB, ValueB, NumberB, KeyB`

Jack, 13, 4, 11

Alex, 22, 5, 18

Bill, 67, 6, 20

Mark 18, 8, 26

Score = abs(11 - 10)/9 + abs(18 - 30)/2 + abs(26 - 15)/3

So it will be given by the summation of the abs(KeyB - ValueA)/NumberA scores, where abs is the absolute value of the subtraction.

How can I do this?

Answer

First of all, based on your formula, `abs(KeyB - ValueA)/NumberA`

, you should have

```
Score = abs(11 - 10)/9 + abs(18 - 30)/2 + abs(26 - 15)/3 = 9.7778
```

instead of

```
Score = abs(11 - 10)/9 + abs(18 - 30)/2 + abs(26 - 15)/18
```

the awk command can be

```
awk -F, 'function abs(x){return ((x < 0.0) ? -x : x)}
BEGIN {while (getline < "file1.csv" ) { f[$1] = $2 ; g[$1] = $3; h[$1] = $4 } }
{ if (g[$1] != 0 ) score+= abs($4 - f[$1])/g[$1] } END { print score } ' file2.csv
```

Source (Stackoverflow)