Flix Flix - 7 months ago 18
PHP Question

What's a good way to find and mark paragraphs or other HTML tags that contain a string, with PHP?

Context: WordPress development sites with lorem ipsum in random places. Would like to present these content areas in red, so they're not missed and they're prominent during review.

Example:

<p>This is real content and has no dummy words.</p>
<p>This has words like lorem and ipsum. It should be highlighted.</p>


Desired end result:

<p>This is real content and has no dummy words.</p>
<p style="color:red">This has words like lorem and ipsum. It should be highlighted.</p>


Thank you!

Answer

Here we go again... Don't use a regex to parse html! Use an html parser like DOMDocument, here's what you need:

<?php
//DEBUG START - Remove on production mode
error_reporting(E_ALL);
ini_set('display_errors', '1');
//DEBUG END
$html = <<< EOF
<p>This is real content and has no dummy words.</p>
<p>This has words like lorem and ipsum. It should be highlighted.</p>
EOF;

$dom = new DOMDocument(); //create new DOMDocument
$dom->loadHTML($html); // load the $html in the new DOMDocument
$xpath = new DOMXPath($dom); // create a new DOMXPath
// loop all <p> tags on the html
foreach($xpath->query("//p") as $paragraph ){ //
        //if paragraph text contains lorem ipsum
        if(preg_match('/lorem|ipsum/im', $paragraph->textContent)){
        //add attribute style="color:red"
        $paragraph->setAttribute("style", "color:red");
        }
}
//save the new html with the modifications above
$html =  $dom->saveHTML();
echo $html;

Output:

<p>This is real content and has no dummy words.</p>
<p style="color:red">This has words like lorem and ipsum. It should be highlighted.</p>

Live Demo


Note:

PHP >= 5.2.6, will automatically add <html><body> and <!DOCTYPE> tags to the document if they are missing, without asking whether you want them, Here's a simple hack to remove them:

$html = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));
echo $html;