Akshay Shukla Akshay Shukla - 1 year ago 66
Bash Question

Merge two tsv files on a key value removing columns that don't have all fields

I'm trying to merge two TSV columns on a common key. Lets say file A is:

a 100
b 200
c 300

and File B is:

a "hello"
c "my name is"

I want the output file to look like:

a 100 "Hello"
c 300 "my name is"

So far, I have created a file that looks like this:

a 100 "Hello"
b 200 ""
c 300 "my name is"

I want to remove the extra unneeded row because it doesn't possess all the fields.

Here is the code I have thus far

awk '
NR==1{print "column_A","column_B","column_C"}

print $0,( f=$1 in A ? A[$1] : "" )
if(f) delete A[$1]
for(i in A)
print i,"",A[i]
' OFS='\t' file2 file1

Is there any way of doing a slight modification on this code to get that output?

Thank you!

Answer Source

Use the join command:

join -t$'\t' file1 file2

Note that this requires that the input files be sorted on the joining field. If they aren't, you can use process substitution.

join -t$'\t' <(sort -t$'\t' -k1,1 file1) <(sort -t$'\t' -k1,1 file2)