instinct246 instinct246 - 5 months ago 10
Bash Question

How to join lines not starting with specific pattern to the previous line in UNIX?

Please take a look at the sample file and the desired output below to understand what I am looking for.

It can be done with loops in a shell script but I am struggling to get an

awk
/
sed
one liner.

SampleFile.txt



These are leaves.
These are branches.
These are greenery which gives
oxygen, provides control over temperature
and maintains cleans the air.
These are tigers
These are bears
and deer and squirrels and other animals.
These are something you want to kill
Which will see you killed in the end.
These are things you must to think to save your tomorrow.


Desired output

These are leaves.
These are branches.
These are greenery which gives oxygen, provides control over temperature and maintains cleans the air.
These are tigers
These are bears and deer and squirrels and other animals.
These are something you want to kill Which will see you killed in the end.
These are things you must to think to save your tomorrow.

Answer

Not a one-liner (but see end of answer!), but an awk-script:

#!/usr/bin/awk -f

NR == 1     { line = $0 }
/^These/    { print line; line  = $0 }
! /^These/  { line = line " " $0 }
END         { print line }

Explanation:

I'm accumulating, building up, lines that start with "These" with lines not starting with "These", outputting the completed lines whenever I find the next line with "These" at the beginning.

  1. Store the first line (the first "record").
  2. If the line starts with "These", print the accumulated (previous, now complete) line and replace whatever we have found so far with the current line.
  3. If it doesn't start with "These", accumulate the line (i.e concatenate it with the previously read incomplete lines, with a space in between).
  4. When there's no more input, print the last accumulated (now complete) line.

Run like this:

$ ./script.awk data.in

As a one-liner:

$ awk 'NR==1{c=$0} /^These/{print c;c=$0} !/^These/{c=c" "$0} END{print c}' data.in

... but why you would want to run anything like that on the command line is beyond me.

EDIT Saw that it was the specific string "These" (/^These/) that was what should be looked for. Previously had my code look for uppercase letters at the start of the line (/^[A-Z]/).