I have 40 files of 2GB each, stored on an NFS architecture. Each file contains two columns: a numeric id and a text field. Each file is already sorted and gzipped.
How can I merge all of these files so that the resulting output is also sorted?
sort -m -k 1
This is a use case for process substitution. Say you have two files to sort,
sortb.gz. You can give the output of
gunzip -c FILE.gz to sort for both of these files using the
<(...) shell operator:
sort -m -k1 <(gunzip -c sorta.gz) <(gunzip -c sortb.gz) >sorted
Process substitution substitutes a command with a file name that represents the output of that command, and is typically implemented with either a named pipe or a
/dev/fd/... special file.
For 40 files, you will want to create the command with that many process substitutions dynamically, and use
eval to execute it:
cmd="sort -m -k1 " for input in file1.gz file2.gz file3.gz ...; do cmd="$cmd <(gunzip -c '$input')" done eval "$cmd" >sorted # or eval "$cmd" | gzip -c > sorted.gz