geoidesic geoidesic - 3 months ago 20
PHP Question

How can I select only the immediate parent node of a text string using xpath for every match

Note: this differs from the following question in that here we have values appearing within a node and within a childnode of that same node:

XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode

Given the following html:

$content =
'<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
<div>
<p>During the <a href="#">interim</a> there shall be interim nourishment supplied</p>
</div>
<div>
<ul><li>During the interim there shall be nourishment supplied</li></ul>
</div>
</body>
</html>';


And the following xpath:

//*[contains(text(),'interim')]


... only provides 3 matches, whereas I want four matches. As per comments, the four elements I'm expecting are P P A LI.

TML TML
Answer

This works exactly as expected. See this glot.io link.

<?php

$html = <<<HTML
<html>
 <body>
  <div>
   <p>During the interim there shall be nourishment supplied</p>
  </div>
  <div>
   <p>During the <a href="#">interim</a> there shall be interim nourishment supplied</p>
  </div>
  <div>
   <ul><li>During the interim there shall be nourishment supplied</li></ul>
  </div>
 </body>
</html>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

foreach($xpath->query('//*/text()[contains(.,"interim")]') as $n) var_dump($n->getNodePath());

You will get four matches:

  • /html/body/div[1]/p/text()
  • /html/body/div[2]/p/a/text()
  • /html/body/div[2]/p/text()[2]
  • /html/body/div[3]/ul/li/text()