AndyPerlitch AndyPerlitch - 1 month ago 13
Linux Question

Sed: Extracting regex pattern from lines

I have an input stream of many lines which look like this:

path/to/file: example: 'extract_me.proto'
path/to/other-file: example: 'me_too.proto'
path/to/something/else: example: 'and_me_2.proto'
...


I'd like to just extract the
*.proto
filenames from these lines, and I have tried:

[INPUT] | sed 's/^.*\([a-zA-Z0-9_]+\.proto\).*$/\1/'


I know that part of my problem is that
.*
is greedy and I'm going to get things like
e.proto
and
o.proto
and
2.proto
, but I can't even get that far... it just outputs with the same lines as the input. Any help would be greatly appreciated.

Answer

I find it helpful to use extended regex for this purpose (-r) in which case you need not escape your brackets.

sed -r 's/^.*[^a-zA-Z0-9_]([a-zA-Z0-9_]+\.proto).*$/\1/'

The addition of [^a-zA-Z0-9_] forces the .* to not be greedy.