Tomek Tomek - 1 month ago 7
Bash Question

Get rid of unwanted lines from file

In bellow example ^[ - are escape characters to stain terminal output (just type ctrl+v+[).

1) My file:

-------- just to mark start of file ----------
^[[1;31mbla bla bla^[[0m



^[[0;36mTREE;01;^[[0m


^[[1;31m^[[0m
^[[1;31m^[[1;31mapple tree:^[[0m^[[0m
^[[1;31m4 apples^M^M^[[0m
^[[1;31m6 leafs^M^[[0m


^[[0;36mTREE;02;^[[0m


^[[0;36mTREE;03;^[[0m

withered

^[[0;36mTREE;04;^[[0m


^[[0;36mTREE;05;^[[0m

^[[0;36mTREE;06;^[[0m

^[[0;36mTREE;07;^[[0m


^[[1;31m^[[0m
^[[1;31m^[[1;31mcherry tree:^[[0m^[[0m
^[[1;31mbig branches^M^M^[[0m
^[[1;31mtchick roots^M^[[0m



^[[0;36mTREE;08;^[[0m


^[[0;36mMy tree ^[[0m I have tree house on it^[[0;31m:-)^[[0m



^[[0;36mTREE;09;^[[0m

-------- just to mark end of file ----------


2) I want to get rid of all "empty labels" - it is all labels that have no comments under it.

So the result I want to achieve is:

-------- just to mark start of results ----------
^[[1;31mbla bla bla^[[0m



^[[0;36mTREE;01;^[[0m


^[[1;31m^[[0m
^[[1;31m^[[1;31mapple tree:^[[0m^[[0m
^[[1;31m4 apples^M^M^[[0m
^[[1;31m6 leafs^M^[[0m


^[[0;36mTREE;03;^[[0m

withered

^[[0;36mTREE;07;^[[0m


^[[1;31m^[[0m
^[[1;31m^[[1;31mcherry tree:^[[0m^[[0m
^[[1;31mbig branches^M^M^[[0m
^[[1;31mtchick roots^M^[[0m



^[[0;36mTREE;08;^[[0m


^[[0;36mMy tree ^[[0m I have tree house on it^[[0;31m:-)^[[0m



-------- just to mark end of results ----------


3) I do:

pcregrep -M 'TREE.*\n(\n|\s)+(?=.*TREE|\z)' my_file


and it works as I expect - it leaves only labels with no comments

-------- just to mark start of results ----------
^[[0;36mTREE;02;^[[0m


^[[0;36mTREE;04;^[[0m


^[[0;36mTREE;05;^[[0m

^[[0;36mTREE;06;^[[0m

^[[0;36mTREE;09;^[[0m

-------- just to mark end of results ----------


4) But command:

pcregrep -Mv 'TREE.*\n(\n|\s)+(?=.*TREE|\z)' my_file


products "wired results" I do not understand.

*) How to get result I want?

With any tool like: pcregrep, ag, ack, sed, awk, ...

Answer

Well I did it.

(1) sed 's/^M//g;
(2) s/$/#VAV#/' my_file | \
(3) paste -sd "" | \
(4) sed 's/^[\[0;36m[[:print:]]\+^[\[0m\(\(#VAV#\|[[:blank:]]\|^[\[0;36m[[:print:]]\+^[\[0m\)\)*\(\(^[\[0;36m[[:print:]]\+^[\[0m\)\|$\)/\4/g; 
(5) s/#VAV#/\n/g'

(1) Get rid if ^M escape char - it handicap things.
(2) Put "some deliberate" string at end of each line.
(3) Concatenate all lines into one string.
(4) Do proper regular expression substitution.
(5) Change back that string from point (2) to end of line.

Comments