Marco Falzone Marco Falzone - 3 months ago 15
Linux Question

How to delete lines from TXT or CSV with specific pattern

I have a txt file formatted as follows:

The aim is to remove the rows which begin with the word "Subtotal Group 1" or "Subtotal Group 2" or "Grand Total" (such strings are always at the beginning of the line), but I need to remove them only if the remaining portion of the line have blank fields (or filled with spaces).

It could be achievable with awk or sed (1 pass), but I'm currently doing with 3 separate steps (one for each text). A more generic syntax would be great. Thanks everybody.

My txt file looks like this:

Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00 500 First Line Text 1685.52
1.00 502 Second Line Text 280.98
530 Other Line text 157.32
_________________________________________________________________________
Subtotal Group 1
Subtotal Group 1
Subtotal Group 1
Subtotal Group 1 2123.82
Subtotal Group 1
Subtotal Group 1

========================================================================
GROUP 2
========================================================================

7.00 701 First Line Text 53.63
711 Second Line text 97.85
7.00 740 Third Line text 157.32
741 Any Line text 157.32
742 Any Line text 18.04
801 Last Line text 128.63
_______________________________________________________________________
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2 612.79
Subtotal Group 2
_______________________________________________________________________
Grand total
Grand total
Grand total
Grand total
Grand total
Grand total
Grand total 1511.03


The goal output I'm trying to achieve is:

Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00 500 First Line Text 1685.52
1.00 502 Second Line Text 280.98
530 Other Line text 157.32
_______________________________________________________________________
Subtotal Group 1 2123.82

=======================================================================
GROUP 2
=======================================================================

7.00 701 First Line Text 53.63
711 Second Line text 97.85
7.00 740 Third Line text 157.32
741 Any Line text 157.32
742 Any Line text 18.04
801 Last Line text 128.63
_______________________________________________________________________
Subtotal Group 2 612.79
_______________________________________________________________________
Grand total 1511.03

Answer

That's a job grep was invented to do:

$ grep -Ev '^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$' file
Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00   500 First Line Text                                      1685.52
1.00   502 Second Line Text                                      280.98
       530 Other Line text                                       157.32
_________________________________________________________________________
Subtotal Group 1                                                2123.82

========================================================================
GROUP 2
========================================================================

7.00   701 First Line Text                                        53.63
       711 Second Line text                                       97.85
7.00   740 Third Line text                                       157.32
       741 Any Line text                                         157.32
       742 Any Line text                                          18.04
       801 Last Line text                                        128.63
_______________________________________________________________________
Subtotal Group 2                                                 612.79
_______________________________________________________________________
Grand total                                                      1511.03

You can use the same regexp in awk or sed if you prefer:

awk '!/^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$/' file
sed -E '/^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$/d' file
Comments