Bhavya Arora Bhavya Arora - 4 months ago 8
Bash Question

bash Join command, Leaving out a row of numbers

I have two files, I want to take out the rows which have common data in the third column. But it is leaving out a row which should be matched.

File1

b b b
4 5 3
c c c


File2

1 2 3 4
a b c d
e f g h
i j k l
l m n o


The output is:

c c c a b d


The command used is:

join -1 3 -2 3 --nocheck-order File1.txt File2.txt


It is missing out the row with 3 as the common field, even after placing the --nocheck-order

Edit:

Expected output:

c c c a b d
3 4 5 1 2 4

Answer

As an alternative to 2 sort commands (can be very expensive for big files) and then a join, you can use this single awk command to get your output:

awk 'FNR == NR{a[$3]=$0; next} $3 in a{print $3, a[$3], $1, $2, $4}' file1 file2

3 4 5 3 1 2 4
c c c c a b d

Explanation:

NR == FNR {                  # While processing the first file
  a[$3] = $0                 # store the whole line in array a using $3 as key
  next
}

$3 in a {                    # while processing the 2nd file, when $3 is found in array
  print $3,a[$3],$1,$2,$4    # print relevant fields from file2 and the remembered
                             # value from the first file.
}