Vlad Vlad - 3 months ago 9
Bash Question

Make cat command to operate recursively looping through a directory

I have a large directory of data files which I am in the process of manipulating to get them in a desired format. They each begin and end 15 lines too soon, meaning I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence.

To begin, I have written the following code to separate the relevant data into easy chunks:

#!/bin/bash

destination='media/user/directory/'
for file1 in `ls $destination*.ascii`
do
echo $file1
file2="${file1}.end"
file3="${file1}.snip"
sed -e '16,$d' $file1 > $file2
sed -e '1,15d' $file1 > $file3
done


This worked perfectly, so the next step is the worlds simplest
cat
command:

cat $file3 $file2 > outfile


However, what I need to do is to stitch
file2
to the previous
file3
. Look at this screenshot of the directory for better understanding.

See how these files are all sequential over time:

*_20090412T235945_20090413T235944_* ### April 13
*_20090413T235945_20090414T235944_* ### April 14


So I need to take the 15 lines snipped off the April 14 example above and paste it to the end of the April 13 example.

This doesn't have to be part of the original code, in fact it would be probably best if it weren't. I was just hoping someone would be able to help me get this going.

Thanks in advance! If there is anything I have been unclear about and needs further explanation please let me know.

Answer

"I need to strip the first 15 lines off one file and paste them to the end of the previous file in the sequence."

If I understand what you want correctly, it can be done with one line of code:

awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3

When this has run, the files file1.new, file2.new, and file3.new will be in the new form with the lines transferred. Of course, you are not limited to three files: you may specify as many as you like on the command line.

Example

To keep our example short, let's just strip the first 2 lines instead of 15. Consider these test files:

$ cat file1
1
2
3
$ cat file2
4
5
6
7
8
$ cat file3
9
10
11
12
13
14
15

Here is the result of running our command:

$ awk 'NR==1 || FNR==3{close(f); f=FILENAME ".new"} {print>f}' file1 file2 file3
$ cat file1.new
1
2
3
4
5
$ cat file2.new
6
7
8
9
10
$ cat file3.new
11
12
13
14
15

As you can see, the first two lines of each file have been transferred to the preceding file.

How it works

awk implicitly reads each file line-by-line. The job of our code is to choose which new file a line should be written to based on its line number. The variable f will contain the name of the file that we are writing to.

  • NR==1 || FNR==16{f=FILENAME ".new"}

    When we are reading the first line of the first file, NR==1, or when we are reading the 16th line of whatever file we are on, FNR==16, we update f to be the name of the current file with .new added to the end.

    For the short example, which transferred 2 lines instead of 15, we used the same code but with FNR==16 replaced with FNR==3.

  • print>f

    This prints the current line to file f.

    (If this was a shell script, we would use >>. This is not a shell script. This is awk.)

Using a glob to specify the file names

destination='media/user/directory/'
awk 'NR==1 || FNR==16{close(f); f=FILENAME ".new"} {print>f}'  "$destination"*.ascii