Chris Null Chris Null - 4 months ago 9
Bash Question

Join lines based on the first field patern

I have a file with 200,000 lines. The begining of each line starts with "IMAGE", "HISTO" or "FRAG". I need to join lines HISTO and FRAG to the IMAGE line. Here is an example.

IMAGE Lots of Data on this line
HISTO usually numbers 0 0 1 1 0 1 0
FRAG Always at least 1 of these lines but can be more


The result needs to look like this:

>IMAGE Lots of Data on this line HISTO usually numbers 0 0 1 1 0 1 0 FRAG Always at least 1 of these lines but can be more


It is possible to have many FRAG lines before it starts over with an IMAGE line. I am using mac so I can use pretty much any tool, but I am much most familiar with vi.

Answer

AWK:

awk '/^IMAGE/{if(NR>1)print a; a=$0} /^(FRAG|HISTO)/{a=a" "$0}' test.in

Out loud:

/^IMAGE/ {        # if it starts with IMAGE
  if(NR>1)        # this removes ugly empty first line in output
    print a       # empty buffer variable to output
    a=$0          # reset the buffer after emptying
} 
/^(FRAG|HISTO)/ { # if it starts with FRAG or HISTO
  a=a" "$0        # append to the buffer variable
}
Comments