user2743 user2743 - 3 months ago 22
Linux Question

Combining two text files based the second field in the first text file only when also present in the second text file

I have two text files that I would like to somehow combine yet substitute based on the second field in the first file. Below is the format of the first file. Which is words with their count from a corpus.

file_1.txt

1000 the
999 been
950 phone
850 ball
800 watch
799 porch


File 2 is some of the words that can be found in the first file but have a breakdown of that word from the second field on.

file_2.txt

the th e
been be en
shirt sh ir t
phone pho ne
desk d esk
chair cha i r
watch wa t c h
floor f loo r


What I would like to get is below. When the word(s) are present in both files I like to only have the word break down from the second file.

file_3.txt

1000 th e
999 be en
950 pho ne
850 ball
800 wa t c h
799 porch


I've been trying to do some
sort
stuff between the two files based on the fields but I'm pretty lost.

Answer Source

You could use Awk:

awk 'FNR == NR { m[$2]=$1 } FNR != NR && $1 in m { w=$1; $1=""; print m[w] $0}' file_1.txt file_2.txt

That is:

  • For each line in the first file, build a map of word -> id
  • For each line in the second file:
    • Save the first field (the word) in a variable
    • Clear the first field
    • Print the id using the map and the word, and the rest of the line