DrewVS DrewVS - 4 months ago 6
Perl Question

How to extract string following a pattern with GREP, REGEX or PERL

This is my first post, so please bear with me. I have a file that looks something like this:

<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>


I need to extract anything within the quotes that follow "name=", i.e., content_analyzer , content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

Thanks!

Answer

Old Version of the Answer:

I think the easiest one-line solution (without writing a full script) would be to use grep:

grep -no 'name="[^"]*"' file.html

In this line:

  • The n option will print the lines that matched the pattern. Simply for informative reasons, at first glance. Remove if you don't want it.
  • The o option prints only the matched text, not the entire line itself.
  • file.txt is the path to your file.

Also if you want the results saved to a file, you can pipe them by appending > results.txt:

grep -o 'name="[^"]*"' file.html > results.txt

The big problem here is that grep will not support look-arounds (at least I don't think so). Thus, the result will be something like:

name="content_analyzer"
name="content_analyzer2"
name="content_analyzer_items"

It needs some clean up. That's easy to do in your text editor with some find/replaceā€¦ but that's why it's not a complete solution.


How I would do it

Right in Vim :-)

1st step
Delete any lines that does not contain name=

:v/name=/d

2nd step
Extract the content inside name=""

:%s/^.*name="\([^"]*\)".*$/\1

Bang, not even had to go out of the text editor.


Update

As stated by Dennis Williamson in the comments, grep does have look arounds with the use of the -P option, which according to the manual interprets the pattern as a Perl regular expression. Fantastic...

So here is definitive one-lined solution:

grep -Po 'name="\K.*?(?=")' file.txt
Comments