Bootsector Bootsector - 1 month ago 7
Linux Question

Delete only fully formed line ranges from a text file while ignoring those that only have a start delimiter

I what to delete lines between START and END keywords as described below:

START
text1
text2
text3
START
text4
END
text5
text6
START
test7
START
test8
END


My problem is in the START keyword not always closed with END. As from the example above the first START did not close with END but another START again after TEXT3.

So I cannot use the following sed command:

sed '/START/,/END/d' test.txt


because it will delete the lines from TEXT1 to TEXT 4 and also TEXT7-8.

But I want only to delete lines TEXT4 and TEXT8. So the following output should be like this:

START
text1
text2
text3
text5
text6
START
text7

Answer

If you have GNU awk, you can try the following:

awk -v RS='(^|\n)START|END(\n|$)' '
  RT ~ "END" {
    skipped=1
    next
  }
  NF {
    print (skipped ? "" : "START\n") gensub("^\n+|\n+$", "", "g")
    skipped=0
  }
' test.txt