Air Air - 9 months ago 41
Bash Question

How can I combine a set of text files, leaving off the first line of each?

As part of a normal workflow, I receive sets of text files, each containing a header row. It's more convenient for me to work with these as a single file, but if I

cat
them naively, the header rows in files after the first cause problems.

The files tend to be large enough (103–105 lines, 5–50 MB) and numerous enough that it's awkward and/or tedious to do this in an editor or step-by-step, e.g.:

$ wc -l *
20251 1.csv
124520 2.csv
31158 3.csv
175929 total

$ tail -n 20250 1.csv > 1.tmp

$ tail -n 124519 2.csv > 2.tmp

$ tail -n 31157 3.csv > 3.tmp

$ cat *.tmp > combined.csv

$ wc -l combined.csv
175926 combined.csv


It seems like this should be doable in one line. I've isolated the arguments that I need but I'm having trouble figuring out how to match them up with
tail
and subtract 1 from the line total (I'm not comfortable with
awk
):

$ wc -l * | grep -v "total" | xargs -n 2
20251 foo.csv
124520 bar.csv
31158 baz.csv
87457 zappa.csv
7310 bingo.csv
29968 niner.csv
2086 hella.csv

$ wc -l * | grep -v "total" | xargs -n 2 | tail -n
tail: option requires an argument -- n
Try 'tail --help' for more information.
xargs: echo: terminated by signal 13

Air Air
Answer Source

You don't need to use wc -l to calculate the number of lines to output; tail can skip the first line (or the first K lines), just by adding a + symbol when using the -n (or --lines) option, as described in the man page:

  -n, --lines=K            output the last K lines, instead of the last 10;
                             or use -n +K to output starting with the Kth

This makes combining all files in a directory without the first line of each file as simple as:

$ tail -q -n +2 * > combined.csv

$ wc -l *
    20251 foo.csv
   124520 bar.csv
    31158 baz.csv
    87457 zappa.csv
     7310 bingo.csv
    29968 niner.csv
     2086 hella.csv
   302743 combined.csv
   605493 total

The -q flag suppresses headers in the output when globbing for multiple files with tail.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download