Akshay Shukla Akshay Shukla - 5 months ago 14
Bash Question

Merge two tsv files on a key value removing columns that don't have all fields

I'm trying to merge two TSV columns on a common key. Lets say file A is:

a 100
b 200
c 300


and File B is:

a "hello"
c "my name is"


I want the output file to look like:

a 100 "Hello"
c 300 "my name is"


So far, I have created a file that looks like this:

a 100 "Hello"
b 200 ""
c 300 "my name is"


I want to remove the extra unneeded row because it doesn't possess all the fields.

Here is the code I have thus far

awk '
NR==1{print "column_A","column_B","column_C"}

FNR==NR{
A[$1]=$2
next
}
{
print $0,( f=$1 in A ? A[$1] : "" )
if(f) delete A[$1]
}
END{
for(i in A)
print i,"",A[i]
}
' OFS='\t' file2 file1


Is there any way of doing a slight modification on this code to get that output?

Thank you!

Answer

Use the join command:

join -t$'\t' file1 file2

Note that this requires that the input files be sorted on the joining field. If they aren't, you can use process substitution.

join -t$'\t' <(sort -t$'\t' -k1,1 file1) <(sort -t$'\t' -k1,1 file2)
Comments