Mikhail Karakulov Mikhail Karakulov - 3 months ago 17
PHP Question

Xpath strange behavior - not matching text nodes

While doing such things

foreach ($xpath->query('.//tpl-static', $domTemplateContainer) as $domStatic) {
/* ... */
$domStatic->parentNode->removeChild($domStatic);
}


All seems to work fine.

But when dealing with xml-comments and more importantly - text nodes it fails to work as intended:

foreach ($xpath->query('.//text()[normalize-space() = ""]', $domDocumentFragment) as $domNode) {
$domNode->parentNode->removeChild($domNode);
}


some text nodes are just not selected but some are. I could not find logic behind this. Predicate does not matter.
But I also found following query works fine:
./descendant-or-self::text()[normalize-space() = ""]


Why does
.//
only work for element nodes but not for text nodes? Is it libxml/php bug or something to be reported or I've missed something?

ADDITION:

Complete example (adapted from complex project):



$xml = '
<tpl-static>
<link rel="shortcut icon" type="image/x-icon" href="/static/images/icon.ico" />
<link rel="stylesheet" type="text/css" href="/static/css/html5reset-1.6.1.css" />
<link rel="stylesheet" type="text/css" href="/static/css/style.css" />
<script src="/static/js/underscore.js"></script>
<!-- <script src="/static/js/jquery.adaptive-backgrounds.js"></script> -->
<script src="/static/js/jquery.maskedinput.min.js"></script>
<link href="/static/js/jquery-ui-1.11.2.custom/jquery-ui.css" rel="stylesheet"/>
<script src="/static/js/jquery-ui-1.11.2.custom/jquery-ui.min.js"></script>
<link rel="stylesheet" href="/static/js/jquery.magnific-popup/magnific-popup.css" />
<script src="/static/js/jquery.magnific-popup/jquery.magnific-popup.js"></script>
<script src="/static/templates/dealers-page-includes/page-includes.js"></script>
</tpl-static>
<br/>

';

$domDocument = new \DOMDocument('1.0', 'utf-8');
$xpath = new \DOMXPath($domDocument);
$domDocumentFragment = $domDocument->createDocumentFragment();
$domDocumentFragment->appendXml($xml);

$templateName = 'test';
//$it = $this;
$adoptTemplate = function($domTemplateContainer) use (&$adoptTemplate, /*$it,*/ $domDocument, $xpath, $templateName) {

foreach ($xpath->query('.//comment()', $domTemplateContainer) as $domComment) {
$domComment->parentNode->removeChild($domComment);
}

foreach ($xpath->query('.//tpl-static', $domTemplateContainer) as $domStatic) {
foreach ($domStatic->childNodes as $curChildNode) {
//$it->_domDocumentHead->appendChild($curChildNode->cloneNode(true));
}
$domStatic->parentNode->removeChild($domStatic);
}
};

$adoptTemplate($domDocumentFragment);

// FAIL!
/*foreach ($xpath->query('.//text()[normalize-space() = ""]', $domDocumentFragment) as $domNode) {
$domNode->parentNode->removeChild($domNode);
}*/
// HERE IS
// workaround...
foreach ($xpath->query('./descendant-or-self::text()[normalize-space() = ""]', $domDocumentFragment) as $domNode) {
$domNode->parentNode->removeChild($domNode);
}

if ($domDocumentFragment->childNodes->length > 1) {
throw new \Exception('Single node expected in template "' . $templateName . '", ' . $domDocumentFragment->childNodes->length . ' given.');
}

ThW ThW
Answer

I stripped down you code to test different expressions.

$xml = '
<tpl-static>
    <link rel="shortcut icon" type="image/x-icon" href="/static/images/icon.ico" />
    <link rel="stylesheet" type="text/css" href="/static/css/html5reset-1.6.1.css" />
</tpl-static>
<br/>
';

$domDocument = new \DOMDocument('1.0', 'utf-8');
$xpath = new \DOMXPath($domDocument);
$domDocumentFragment = $domDocument->createDocumentFragment();
$domDocumentFragment->appendXml($xml);

$expressions = [
  './/text()[normalize-space() = ""]',
  './*/text()[normalize-space() = ""]',
  './descendant-or-self::text()[normalize-space() = ""]',
  './*/descendant-or-self::text()[normalize-space() = ""]'
];

foreach ($expressions as $expression) {
  $nodes = $xpath->evaluate($expression, $domDocumentFragment);
  var_dump($expression, $nodes->length);
}

Output:

string(33) ".//text()[normalize-space() = ""]"
int(3)
string(34) "./*/text()[normalize-space() = ""]"
int(3)
string(52) "./descendant-or-self::text()[normalize-space() = ""]"
int(6)
string(54) "./*/descendant-or-self::text()[normalize-space() = ""]"
int(3)

As you can see the first two expressions return the same node count while the third (your workaround) returns a larger number. It looks like the first expression does not include the direct text child nodes of the fragment.

I modified the source to include an top level element that can be used as the context for the expressions.

$xml = '<foo>
<tpl-static>
    <link rel="shortcut icon" type="image/x-icon" href="/static/images/icon.ico" />
    <link rel="stylesheet" type="text/css" href="/static/css/html5reset-1.6.1.css" />
</tpl-static>
<br/>
</foo>';

$domDocument = new \DOMDocument('1.0', 'utf-8');
$xpath = new \DOMXPath($domDocument);
$domDocumentFragment = $domDocument->createDocumentFragment();
$domDocumentFragment->appendXml($xml);
$context = $domDocumentFragment->firstChild;

$expressions = [
  './/text()[normalize-space() = ""]',
  './*/text()[normalize-space() = ""]',
  './descendant-or-self::text()[normalize-space() = ""]',
  './*/descendant-or-self::text()[normalize-space() = ""]'
];

foreach ($expressions as $expression) {
  $nodes = $xpath->evaluate($expression, $context);
  var_dump($expression, $nodes->length);
}

Output:

string(33) ".//text()[normalize-space() = ""]"
int(6)
string(34) "./*/text()[normalize-space() = ""]"
int(3)
string(52) "./descendant-or-self::text()[normalize-space() = ""]"
int(6)
string(54) "./*/descendant-or-self::text()[normalize-space() = ""]"
int(3)

This returns the expected result. The first expression now includes the direct child nodes of the context.

It looks like .//text() is interpreted differently if the context node is an document fragment.

You might think that a bug, but according to the W3C spec a fragment is not a valid context for an Xpath expression.

If the XPathEvaluator was obtained by casting the Document then this must be owned by the same document and must be a Document, Element, Attribute, Text, CDATASection, Comment, ProcessingInstruction, or XPathNamespace node.

So to make your source conform to the spec, you would have to iterate the child nodes and evaluate the expression for each node. In this case descendant-or-self::text() would work in a deterministic way.

Comments