vahap eldem vahap eldem - 5 months ago 8
Perl Question

Script for comparing two files and outputs matching values

I am looking for a script for comparing two text files (tab-delimited) and outputs matching value. I used this bash script but it gives only unique values (in other words, it does not work for such purpose);

grep -FwF file_1.txt file_2.txt > out.txt


File_1.txt

ref|apple.1|

ref|apple.1|

ref|apple.1|

ref|peach.1|

ref|peach.1|

ref|peach.1|

ref|fig.1|

ref|pear.1|

ref|pear.1|

ref|apricot.1|

ref|plum.1|

ref|grape.1|

ref|grape.1|

ref|grape.1|

ref|grape.1|



File_2.txt

ref|apple.1| prepared_for_goats

ref|peach.1| prepared_for_tucans

ref|fig.1| prepared_for_piegons

ref|pear.1| prepared for_pigs

ref|apricot.1| prepared_for_sheep

ref|plum.1| prepared_for_gorilla

ref|grape.1| prepared_for_monkeys



Expected_Output

ref|apple.1| prepared_for_goats

ref|apple.1| prepared_for_goats

ref|apple.1| prepared_for_goats

ref|peach.1| prepared_for_tucans

ref|peach.1| prepared_for_tucans

ref|peach.1| prepared_for_tucans

ref|fig.1| prepared_for_piegons

ref|pear.1| prepared for_pigs

ref|pear.1| prepared for_pigs

ref|apricot.1| prepared_for_sheep

ref|plum.1| prepared_for_gorilla

ref|grape.1| prepared_for_monkeys

ref|grape.1| prepared_for_monkeys

ref|grape.1| prepared_for_monkeys

ref|grape.1| prepared_for_monkeys

Many Thanks for all your helps

Answer

grep won't do what you want. grep is excels at selecting text but it is not good at mixing and merging. By contrast, awk is designed for this task:

$ awk -F'|' 'FNR==NR{a[$1,$2]=$3; next} {print $1,$2,a[$1,$2]}' OFS='|'  file2 file1
ref|apple.1| prepared_for_goats 
ref|apple.1| prepared_for_goats 
ref|apple.1| prepared_for_goats 
ref|peach.1| prepared_for_tucans 
ref|peach.1| prepared_for_tucans 
ref|peach.1| prepared_for_tucans 
ref|fig.1| prepared_for_piegons 
ref|pear.1| prepared for_pigs 
ref|pear.1| prepared for_pigs 
ref|apricot.1| prepared_for_sheep 
ref|plum.1| prepared_for_gorilla 
ref|grape.1| prepared_for_monkeys
ref|grape.1| prepared_for_monkeys
ref|grape.1| prepared_for_monkeys
ref|grape.1| prepared_for_monkeys

(The test of the question said that the fields were tab-separated but the sample files showed | as the separator. Since the SO editor does not clearly show tabs, I kept the | as the separator for this demonstration code.)

How it works

  • -F'|'

    Set the field separator on input to |.

  • FNR==NR{a[$1,$2]=$3; next}

    Due to the condition FNR==NR, this is only executed for the first file, file2. For that file, we create associate array a which stores the third field, $3, under the key of the first two fields, $1,$2.

  • print $1,$2,a[$1,$2]

    If we get here are are working on the second file, file1. In that case, we print the first field, the second field, and the value of a that corresponds to those two fields.

  • OFS='|'

    This sets the field separator on output to |.