asddddddaaaad2 asddddddaaaad2 - 1 month ago 11
Linux Question

pulling text between two patterns with awk script

Input text file:

This is a simple test file.
#BEGIN
These lines should be extracted by our script.

Everything here will be copied.
#END
That should be all.
#BEGIN
Nothing from here.
#END


Desired output:

These lines should be extracted by our script.

Everything here will be copied.


My awk script is:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/{a=1;next};a;$1 ~ /#END/ {exit}


and my current output is:

These lines should be extracted by our script.

Everything here will be copied.
#END


The only problem I'm having is that I'm still printing the "#END". I've been trying for a long time to somehow eliminate that. Not sure how to exactly do it.

Answer

This becomes obvious IMO is we comment each command in the script. The script can be written like this:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}

Note that I expanded a to the equivalent form a!=0{print $0}, to make the point clearer.

So the script starts printing each line when the flag is set, and when it reaches the END line, it has already printed the line before it exits. Since you don't want the END line to be printed, you should exit before you print the line. So the script should become:

#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
  next          # skip to the next line
}
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}

In this case, we exit before the line is printed. In a condensed form, it can be written as:

awk '$1~/#BEGIN/{a=1;next}$1~/#END/{exit}a' file

or a bit shorter

awk '$1~/#END/{exit}a;$1~/#BEGIN/{a=1}' file

Regarding the additional constraints raised in the comments, to avoid skipping any BEGIN blocks within the block that is to be printed, we should remove the next statement, and rearrange the lines like in the example right above. In an expanded form it would be like this:

#!/usr/bin/awk -f
$1 ~ /#END/ {   # if we match the END line
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

To also avoid exiting if an END line is found before the block to be printed, we can check if the flag is set before exiting:

#!/usr/bin/awk -f
$1 ~ /#END/ && a != 0 {   # if we match the END line and the flag is set
  exit          # exit the process 
}
a != 0 {        # if the flag is not zero
  print $0      # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
  a=1           # Set a flag to one
}

or in a condensed form:

awk '$1~/#END/&&a{exit}a;$1~/#BEGIN/{a=1}' file