thecomebackid thecomebackid - 4 months ago 8
Bash Question

awk beginner trying to understand awk "thought process"

Related question is here.

I have two files:

file 1:

I am a cat
I am a dog
I am a dog
I am a cat
I am a dog


file 2:

line 1
line 2


Upon executing:

awk '/cat/{getline <"file2"; print};1' file1
line 1
line 1
I am a dog
I am a dog
line 2
line 2
I am a dog


I am expecting:

line 1
I am a cat
I am a dog
I am a dog
line 2
I am a cat
I am a dog


My understanding of awk in the above code:

Read line from file 1, if
cat
exists print line from
file 2
and the
1
at the end tells awk to also print the line from
file 1
. If
cat
is not found, awk prints nothing from
file 2
but still will print the corresponding line from
file 1
.

What appears to be happening is awk reads the first line of
file 1
, finds
cat
and prints the first line from
file 2
. Then awk interperets the
1
as a true to the given condition and again prints the first line from
file 2
. When awk does not find
cat
it interperates the
1
as a true and prints from
file 1
?

Something else I found interesting is when I run this:

awk '/cat/{getline this<"file2"; print this};1' file1
line 1
I am a cat
I am a dog
I am a dog
line 2
I am a cat
I am a dog


What's going on here? Thank you for your time.

Kaz Kaz
Answer
awk '/cat/{getline <"file2"; print};1' file1
line 1
line 1
I am a dog
I am a dog
line 2
line 2
I am a dog

When the line I am a cat is processed, it matches /cat/. And so the action is performed. The action reads a record from file2, which replaces the current $0 line 1. Then, the second rule fires, which consists of 1. 1 is an expression which is always true, so it matches any record. It has no action and so the default action is print. Thus, the current record is printed, and you see line 1 again.

The second occurrence of cat results in line 2 being printed. The getline syntax retains an open stream associated with it, so that multiple evaluations of the same getline expression read successive lines. line 2 is printed twice for the same reason as above.

In the second example, you're using the getline syntax variant which reads into a specified variable name. Thus, it isn't replacing the current record. When the 1 rule is evaluated, the current record is still I am a cat, and so that is printed, rather than line 1 or line 2.

Comments