mercury0114 mercury0114 - 7 months ago 44
Linux Question

What's a short way in Linux to extract pattern string and another pattern string later?

Suppose we have one line of text stored in a file:

// In the actual file this will be one line
{unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3},

What I want is for this particular input extract 3 entries:

// The details, such as whether to put { character in front or not do not matter.
// Any form of output which extracts only these 3 entries and groups them in a
// visually nice way will do the job.
{ID:13, TIMESTAMP:1476280500}
{ID:25, TIMESTAMP:1476280600}
{ID:30, TIMESTAMP:1476280700}
// I do not want the last entry, because it does not contain timestamp field.

So far the closest command I found is

grep -Po {id:[0-9]+(.+?)} input_file

which gives the output


The next improvement I am searching for is how to remove
from each entry and also remove the last entry.

Question: what's the shortest way to do that in Linux?


With GNU awk for multi-char RS and RT and word boundaries:

$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file
{ID:13, TIMESTAMP:1476280500}
{ID:25, TIMESTAMP:1476280600}
{ID:30, TIMESTAMP:1476280700}

The above will work no matter if the input is on one line or multiple lines and no matter what other text you have in the file, all it relies on is the ID appearing before each related TIMESTAMP and that's not hard to change if necessary.