Gordon Gordon - 4 months ago 36
PHP Question

How to force XPath to use UTF8?

I have an XHTML document being passed to a PHP app via Greasemonkey AJAX. The PHP app uses UTF8. If I output the POST content straight back to a textarea in the AJAX receiving div, everything is still properly encoded in UTF8.

When I try to parse using XPath

$dom = new DOMDocument();
$xpath = new DOMXPath($dom);
$query = '//td/text()';
$nodes = $xpath->query($query);
foreach($nodes as $node) {

dumped strings are not utf8. How do I force DOM/XPath to use UTF8?


If it is a fully fledged valid xhtml document you shouldn't use loadhtml() but load()/loadxml().

Given the example xhtml document

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    	<title>xhtml test</title>
    	<h1>A Table</h1>

the script

$raw2 = 'test.html';

$dom = new DOMDocument();
$xpath = new DOMXPath($dom);
var_dump($xpath->registerNamespace('h', 'http://www.w3.org/1999/xhtml'));
$query = '//h:td/text()';
$nodes = $xpath->query($query);
foreach($nodes as $node) {

function foo($s) {
    for($i=0; $i<strlen($s); $i++) {
    	printf('%02X ', ord($s[$i]));
    echo "\n";


C3 84 
C3 96 
C3 9C 
C3 A4 
C3 B6 

i.e. the output/strings are utf-8 encoded