DrewVS DrewVS - 3 months ago 5x
Perl Question

How to extract string following a pattern with GREP, REGEX or PERL

This is my first post, so please bear with me. I have a file that looks something like this:

<table name="content_analyzer" primary-key="id">
<type="global" />
<table name="content_analyzer2" primary-key="id">
<type="global" />
<table name="content_analyzer_items" primary-key="id">
<type="global" />

I need to extract anything within the quotes that follow "name=", i.e., content_analyzer , content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.



Old Version of the Answer:

I think the easiest one-line solution (without writing a full script) would be to use grep:

grep -no 'name="[^"]*"' file.html

In this line:

  • The n option will print the lines that matched the pattern. Simply for informative reasons, at first glance. Remove if you don't want it.
  • The o option prints only the matched text, not the entire line itself.
  • file.txt is the path to your file.

Also if you want the results saved to a file, you can pipe them by appending > results.txt:

grep -o 'name="[^"]*"' file.html > results.txt

The big problem here is that grep will not support look-arounds (at least I don't think so). Thus, the result will be something like:


It needs some clean up. That's easy to do in your text editor with some find/replaceā€¦ but that's why it's not a complete solution.

How I would do it

Right in Vim :-)

1st step
Delete any lines that does not contain name=


2nd step
Extract the content inside name=""


Bang, not even had to go out of the text editor.


As stated by Dennis Williamson in the comments, grep does have look arounds with the use of the -P option, which according to the manual interprets the pattern as a Perl regular expression. Fantastic...

So here is definitive one-lined solution:

grep -Po 'name="\K.*?(?=")' file.txt