user3385666 user3385666 - 3 months ago 14
Bash Question

UNIX Crop multiple files from dynamic position

I have a directory containing a huge amount of html files

I know that in order to find the starting point i have to use the following command

grep -n -m1 "/header" filename.html| cut -d':' -f1


and to find the end point of my crop i have to use this one

grep -n -m1 "footer" 39646_20160820.html | cut -d':' -f1


My question is: how can i crop all files in a directory using these two criteria for each file to have the start and the end of the crop of each file?

Answer

To remove all lines between /header and footer for all files in the current directory:

sed -i.bak '\|/header|,\|footer|d' *

The expression \|/header|,\|footer| defines a range of lines that start with a line that contains /header and ends with a line that contains footer. The command d tells sed to delete all lines in such a range.

* is a glob which refers to all files in the current directory. If you don't want to operate on all files, change this glob.

Example

Consider this test file:

$ cat File
1
/header
2
footer
3

To remove all lines starting with a line that contains /header and ending with a line containing footer and display the result to stdout:

$ sed '\|/header|,\|footer|d' File
1
3

To update all files in the current directory in-place:

$ sed -i.bak '\|/header|,\|footer|d' *

To verify that this worked:

$ cat File
1
3

Backup files

The above command creates backup files with the extension .bak. If you are confident that the command does the right thing and you do not want backup files, then use:

sed -i '\|/header|,\|footer|d' *  # GNU/Linux

Or:

sed -i '' '\|/header|,\|footer|d' *  # OSX/BSD

Keeping the range instead of deleting it

To keep the range, deleting all else:

sed -n '\|/header|,\|footer|p' *

For example, on our sample file:

$ sed -n '\|/header|,\|footer|p' File
/header
2
footer

To save the changes to the file in-place:

sed -ni '\|/header|,\|footer|p' *  # GNU/Linux

Or:

sed -ni '' '\|/header|,\|footer|p' *  # OSX/BSD