ilovkatie ilovkatie - 3 years ago 98
Bash Question

Filter file with a list from another file, based on substring

file1.txt
:

1234567890IDNUMBER1
1234567890IDNUMBER2
1234567890IDNUMBER3
1234567890IDNUMBER4
1234567890IDNUMBER5


Note: IDNUMBERX is a fixed lenght unique ID. In this particular case it is 9 characters long and it start ALWAYS at position 11.

file2.txt
:

IDNUMBER1
IDNUMBER2
IDNUMBER4


Note: List of IDs.

What I want to do is filter first file to delete all lines with IDs not listed in 2nd file.

Expected output:

1234567890IDNUMBER1
1234567890IDNUMBER2
1234567890IDNUMBER4


I found VERY similar question here:
grep matching specific position in lines using words from other file

I tried marked answer there and it is not working for me like author describe:

awk 'NR==FNR{a[$0]=1;next;} substr($0,11,9) in a' file2.txt file1.txt


returns me just one line (last matching):

1234567890IDNUMBER4


the same for the data from linked question.

What can be wrong?

Using: GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

EDIT

Stupid me... It was all about line ending on windows... Windows carriage return sign was causing the problem.

I used:

awk '{ sub("\r$", ""); print }' dos.txt > unix.txt


to convert file to remove carriage return. Source

Answer Source

try:

awk 'FNR==NR{a[substr($0,11)]=$0;next} ($1 in a){print a[$1]}' File1.txt File2.txt
1234567890IDNUMBER1
1234567890IDNUMBER2
1234567890IDNUMBER4

EDIT: Adding one more solution on same too now.

awk 'FNR==NR{a[$2]=$0;next} ($1 in a){print a[$1]}' FIELDWIDTHS="10 9"  File1.txt File2.txt
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download