ilovkatie ilovkatie - 1 year ago 90
Linux Question

Passing result of tr as second parameter in awk

My command:

awk 'NR==FNR{a[$0]=1;next;} substr($0,50,6) in a' file1 file2

The problem is that file 2 contains
characters and awk consider it as binary file.

with space character:

tr '\000' ' ' < file2 > file2_not_binary

solves binary file problem.

However my file2 is a 20GB file. And I don't want to do
separately and save result as another file. I want to pass the result of

I have tried:

awk 'NR==FNR{a[$0]=1;next;} substr($0,50,6) in a' file1 < (tr '\000' ' ' < file2)

But the result is:

The system cannot find the file specified.

Another question is: can my memory or awk handle such a big file at once? I'm working on 12GB RAM PC.


One of the answer works as I expected (credits to Ed Morton)

tr '\000' ' ' < file2 | awk 'NR==FNR{a[$0];next} substr($0,50,6) in a' file1 -

However it is like 2 time slower then doing the same in 2 steps - first removing
and save it and then using
to search. How I can speed it up?


My bad. Ed Morton solution is actually a little bit faster then doing the same in two separately commands.

Two commands separately:

Two commands piped:

Answer Source

Since awk isn't storing your 2nd file in memory the size of that file is irrelevant except for speed of execution. Try this:

tr '\000' ' ' < file2 | awk 'NR==FNR{a[$0];next} substr($0,50,6) in a' file1 -
