zeddex zeddex - 2 months ago 14
HTML Question

Regex that makes sure a match starts with a string

I am running a regex on some HTML and need to extract some image title tags.

The image title tags look like this:

title="Image Title Here"

And this works for the task:


However the problem is that it will grab unwanted title tags also. I noticed though in the HTML i run the regex on the images are inside h3 tags.

How can i update my regex to make sure it only gets matches from html starting with '

My current regex is:



Using a DOMDocument with XPath should be less error prone:

$html = <<<DATA
<h1>Text 1<img title="Not this"></h1>
<h2>Text 2<img title="Not this"></h2>
<h3>Text 3<img title="This"></h3>

$dom = new DOMDocument('1.0', 'UTF-8');

$xpath = new DOMXPath($dom);
$imgs = $xpath->query('//h3/img[@title]');
$res = array();
foreach($imgs as $img) { 
   array_push($res, $img->getAttribute('title'));


See the PHP demo

The '//h3/img[@title]' xpath expression will find all h3 tags that contain img children that contain title attributes, and $img->getAttribute('title') will get the value from these attributes.