jason.kaisersmith jason.kaisersmith - 2 months ago 13
HTML Question

Xpath query is returning NULL

I am trying to maintain some PHP code which is doing web page scraping. The web page has changed so an update is needed, but I'm not so experienced with Xpath so am struggling.

Basically this is the section of html that is relevant

<div class="carousel-item-wrapper">
<picture class="">
<source srcset="/medias/tea-tree-skin-clearing-foaming-cleanser-1-640x640.jpg?context=product-images/h3b/hd3/8796813918238/tea-tree-skin-clearing-foaming-cleanser_1-640x640.jpg" media="(min-width: 641px) and (max-width: 1024)">
<source srcset="/medias/tea-tree-skin-clearing-foaming-cleanser-1-320x320.jpg?context=product-images/h09/h9a/8796814049310/tea-tree-skin-clearing-foaming-cleanser_1-320x320.jpg" media="(max-width: 640px)">
<img srcset="/medias/myimage.jpg" alt="150 ML" class="">
</picture>
</div>


I am trying to extract the srcset attribute from the IMG tag which is the value of "/medias/myimage.jpg". I'm using XPATH Helper chrome plugin to help me and I have the following xpath;

//div[@class="carousel-item-wrapper"]/picture/img/@srcset


In the plugin, it returns exact what I expect, so it appears to work fine.

If I also use an online xpath tester http://www.online-toolz.com/tools/xpath-editor.php then it also works OK.

But in my PHP code I get a null value.

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->strictErrorChecking = false;
$dom->recover = true;

@$dom->loadHtml($html);
$xPath = new DOMXPath($dom);

//Other xPath queries executed OK.

$node = $xPath->query('//div[@class="carousel-item-wrapper"]/picture/img/@srcset')->item(0);

if ($node === NULL)
writelog("Node is NULL"); // <-- Writes NULL to the log file!


I have of course tried a lot of different variations on this, trying not to specify the attribute name etc. But all with not luck.

What am I doing wrong? I'm sure it must be something simple, but I can't spot it.

Other extracts using my PHP code on the same HTML document are working OK. So it is just this element causing me trouble.

Answer

PHP's DOMXPath class seems to have trouble with self-closing tags. You need to add a double forward-slash if you're looking to find a self-closing tag, so your new xPath query should be:

//div[@class="carousel-item-wrapper"]/picture//img/@srcset