ilovkatie ilovkatie - 1 year ago 84
Linux Question

Passing result of tr as second parameter in awk

My command:

awk 'NR==FNR{a[$0]=1;next;} substr($0,50,6) in a' file1 file2


The problem is that file 2 contains
\000
characters and awk consider it as binary file.

Replacing
\000
with space character:

tr '\000' ' ' < file2 > file2_not_binary


solves binary file problem.

However my file2 is a 20GB file. And I don't want to do
tr
separately and save result as another file. I want to pass the result of
tr
to
awk
.

I have tried:

awk 'NR==FNR{a[$0]=1;next;} substr($0,50,6) in a' file1 < (tr '\000' ' ' < file2)


But the result is:

The system cannot find the file specified.


Another question is: can my memory or awk handle such a big file at once? I'm working on 12GB RAM PC.

EDIT

One of the answer works as I expected (credits to Ed Morton)

tr '\000' ' ' < file2 | awk 'NR==FNR{a[$0];next} substr($0,50,6) in a' file1 -


However it is like 2 time slower then doing the same in 2 steps - first removing
\000
and save it and then using
awk
to search. How I can speed it up?

EDIT2

My bad. Ed Morton solution is actually a little bit faster then doing the same in two separately commands.

Two commands separately:
08:37:053


Two commands piped:
08:07:204

Answer Source

Since awk isn't storing your 2nd file in memory the size of that file is irrelevant except for speed of execution. Try this:

tr '\000' ' ' < file2 | awk 'NR==FNR{a[$0];next} substr($0,50,6) in a' file1 -
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download