user3467855 user3467855 - 1 year ago 49
PHP Question

Extracting text between html tags with multiple classes with DOM and XPATH

I am trying to extract text between 1 HTML tags but fail to do this:

HTML - Text to be extracted (

<span class="font-4 box1-r">3,757,209</span>


$data = frontend::file_get_contents_curl(''.$domain); // custom function that return the HTML string
$dom = new DOMDocument();
$xpath = new DOMXpath($dom);
$backlinks = $xpath->query('//span[@class="font-4 box1-r"]/text()');
var_dump($backlinks); // returns null

Answer Source

The actual problem is due to htmlentities() escaping all tag delimiters (< and >), so you end up loading a long string with no elements and attributes to DOMDocument() :

$data = <<<HTML
<div><span class="font-4 box1-r">3,757,209</span></div>
$doc = new DOMDocument();
echo $doc->saveXML(); demo

output :

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "">
<html><body><p>&lt;div&gt;&lt;span class="font-4 box1-r"&gt;3,757,209&lt;/span&gt;&lt;/div&gt;</p></body></html>