corvax corvax - 4 months ago 13
Linux Question

Create diff between two files based on specific column

I have the following problem.

Say I have 2 files:

A.txt

1 A1
2 A2


B.txt

1 B1
2 B2
3 B3


I want to make diff which is based only on values of first column, so the result should be

3 B3


How this problem can be solved with bash in linux?

Answer

[ awk ] is your friend

awk 'NR==FNR{f[$1];next}{if($1 in f){next}else{print}}' A.txt B.txt

or more simply

awk 'NR==FNR{f[$1];next}!($1 in f){print}' A.txt B.txt

A bit of explanation will certainly help

  1. NR & FNR are awk built-in variables which stand for total number of records - including current - processed so far and total number of records - including current - processed so far in the current file respectively and they will be equal only for the first file processed.

  2. f[$1] creates the array f at first and then adds $1 as a key if the same key doesn't yet exist. If no value is assigned, then f[$1] is auto-initialized to zero, but this aspect doesn't find a use in your case

  3. next goes to the next record with out processing rest of the awk script.

  4. Note that {if($1 in f){next}else{print}} part will be processed only for the second (and subsequent if any) file/s.
  5. $1 in f checks if the the key $1 exists in the array f
  6. The if-else-print part is self explanatory.

Comments