Chudar Chudar - 21 days ago 6
Linux Question

Fetching rows from file_1 based on another file_2

I have a tab delimted text file as follows:

file_1:

A1 13f Jos +
B1 zh4 Kia -
C2 nh2 Met -
D3 5gh Lox +
F4 w4t Nit -


file_2

N3 6jg Jut -
J8 76d Met +
A1 99g Kia -
M6 45k Qox +
V2 87h Nit -


I would like to extract the entries from file_1 where 3rd column entries match with 3rd column entries of file_2 like follows in linux:

B1 zh4 Kia -
C2 nh2 Met -
F4 w4t Nit -


Will
comm-12 file_1.txt file_2.txt
help? Kindly guide me

Answer

awk is probably simplest here (this preserves file_1 input order):

$ awk 'NR==FNR { seen[$3]++; next } seen[$3]' file_2 file_1

B1  zh4  Kia  -
C2  nh2  Met  -
F4  w4t  Nit  -
  • Pattern NR==FNR only matches lines from the first input file (file_2) and builds up an associative array of all 3rd-column values with action { seen[$3]++; next }

    • seen[$3]++ is a common idiom for constructing an associative containing the set of unique field values: accessing key $3 (the value of the 3rd field) in array seen implicitly creates an entry for that key on first access, and post-increment ++ gives the entry a nonzero value, which evaluates to true in a Boolean context (which the pattern discussed below takes advantage of).
  • Due to the next in the previous action, pattern seen[$3] is then only executed for the second input file (file_1) and only evaluates to true if the second file's 3rd-column value was also present in the first file. A pattern that evaluates to true implicitly prints the line at hand.

Comments