Rocky Rocky - 4 months ago 8
PHP Question

Using PHP DOM want to show all string as a output


Here, is my html string in $data variable in php, and that string
have some text like
<140/90 mmHg OR <130/80 mmHg
this line not
showing when i run this code using PHP
DOMDocument
because when coming less-than & grater-than signs its problematic.


<?php
$data = 'THE CORRECT ANSWER IS C.
<p>Choice A Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s</p>
<p></p>
<p>Choice B Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s</p>
<p>Choice D Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s</p>
<p></p>
<p>Choice E simply dummy text of the printing and typesetting industry.</p>
<p></p>
<p><br>THIS IS MY MAIN TITLE IN CAPS<br>This my sub title.</p>
<p><br>TEST ABC: Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<p>1) It is a long established fact <140/90 mmHg OR <130/80 mmHg making it look like readable English will uncover many web sites still in their infancy.
<br><br>2) There are many variations of passages of Lorem Ipsum available. </p>
<p><br>TEST XYZ: Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.</p>
<p><br>TES T TEST: It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p>
<p><br>TESTXXX: It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>';
echo boldFormatExplanation($data);
?>



Also, i have created below PHP function that will convert bold title
and bold some words using PHP
DOMDocument
.


  1. Title with bold: "THIS IS MY MAIN TITLE IN CAPS" (title not always same)

  2. Words with bold: TEST ABC:, TEST XYZ:, TES T TEST:, TESTXXX: (this words are always same)



this above 2 points working well just missing line as i have
described above in first block.


<?php
function boldFormatExplanation($data){
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->encoding = 'utf-8';
$dom->substituteEntities = false;
$dom->preserveWhiteSpace = true;
$internalErrors = libxml_use_internal_errors(true);// Set error level
@$dom->loadHTML($data, LIBXML_HTML_NODEFDTD);// Load html
libxml_use_internal_errors($internalErrors);// Restore error level
$xpath = new DOMXPath($dom);// Dom xpath
$title_flag = true;
foreach($xpath->query('//text()') as $node) {
$txt = trim($node->nodeValue);
$p = $node->parentNode;
if (preg_match("/^\s*(TEST ABC:|TEST XYZ:|TES T TEST:|TESTXXX)(.*)$/s", $node->nodeValue, $matches)) {
// Put Choice in bold:
$p->insertBefore($dom->createElement('b', $matches[1]), $node);
$node->nodeValue = " " . trim($matches[2]);
} else
if (strtoupper($txt) === $txt && $txt !== '') {
// Put header in bold
if($title_flag == true){
$p->insertBefore($dom->createElement('b', $txt), $node);
$node->nodeValue = "";
$title_flag = false;
}
}
}
$domData = $dom->saveHTML();
$data = htmlspecialchars_decode($domData);
return $data;
} ?>


You can run this code at here, also the output skipping this line
<140/90 mmHg OR <130/80 mmHg

Answer

You don't have the choice here, you need to process the string before loading it with DOMDocument::loadHTML. But you can't do it like a barbarian with a blind replacement (because in this case < between script or style tags would be replaced too). You need to use the libxml errors to locate only problematic opening angle brackets. You can do it this way (it isn't fast since you need to build the DOM tree until the errors disappear but it's correct):

define('LIBXML_ERROR_INVALID_ELEMENT_NAME', 68);

$skeleton = '<html><head><meta charset="UTF-8"/></head><body id="root">%s</body></html>';
$htmlDoc = sprintf($skeleton, $data);

$dom = new DOMDocument;

do {
    libxml_use_internal_errors(true);
    $hasError = false;
    $dom->loadHTML($htmlDoc);
    $errors = libxml_get_errors();

    foreach ($errors as $error) {
        if ($error->code == LIBXML_ERROR_INVALID_ELEMENT_NAME) {
            $hasError = true;
            $htmlDoc = preg_replace('~\A(?:.*\R){' . ($error->line - 1) . '}.{' . ($error->column - 2) . '}\K<~u', '&lt;', $htmlDoc);
        }
    }
    libxml_clear_errors();
} while ($hasError);

boldFormatExplanation($dom);

foreach($dom->getElementById('root')->childNodes as $childNode) {
    echo $dom->saveHTML($childNode);
}

As an aside, it's useless to set DOMDocument encoding property when you use DOMDocument::loadHTML after, because the encoding is set with document content (this is the main reason I put myself an html skeleton around $data with <meta charset="UTF-8"/>).

About your bold function, you can write it this way:

function boldFormatExplanation(&$dom) {
    $xpath = new DOMXPath($dom);
    $title_flag = true;

    foreach($xpath->query('//text()') as $node) {
        $txt = trim($node->nodeValue);
        if (empty($txt)) continue;

        $p = $node->parentNode;
        if (preg_match("/^(TEST ABC:|TEST XYZ:|TES T TEST:|TESTXXX)\s*(.*)/s", $txt, $matches)) {
            // Put Choice in bold:
            $p->insertBefore($dom->createElement('b', $matches[1]), $node);
            $node->nodeValue = " " . $matches[2];
        } elseif ($title_flag && strtoupper($txt) === $txt) {
            // Put header in bold
            $p->replaceChild($dom->createElement('b', $txt), $node);
            $title_flag = false;
        }
    }
}