jsouthworth jsouthworth - 1 month ago 9
Linux Question

Unix - Want records from file 2 that are not in file 1 by matching on the first 91 characters

I want to compare file2 to file1 by matching in the first 91 characters of each file and output the full record from file2 to file3. I'm new to Unix commands and just cant seem to figure this out.

Thanks in advance,
Jeff

Answer

I generated dummy files as follows:

file1

A012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
B012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
C012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
D012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
E012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
F012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789

file2

Z012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 1
B012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 2
T012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 3
D012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 4
E012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 5
F012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 6

Then I think you want this:

awk '
   # Processing for file1, basically create associative array entry indexed by leftmost 91 characters
   FNR==NR { f1[substr($0,1,91)]++; next }

   # Processing for second file
   f1[substr($0,1,91)] > 0

   ' file1 file2

Sample Output

B012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 2
D012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 4
E012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 5
F012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 Line 6

Actually, I now think you might want precisely the other lines, if so, change this:

f1[substr($0,1,91)] > 0

to this:

! f1[substr($0,1,91)]
Comments