Sergey Sergey - 3 months ago 15
PHP Question

How to replace div with one of its child p nodes

This html I get from the Response.

And I need to remove the extra text.

There is a line of the following content

<?php
$str = <<<HTML
AAAA <span>span txt</span>
<div class='unique_div' id='xrz' data-id='1'>
div text
<span>span text</span>
<p class='unique_p'>
<span>p span text</span>
<p>p p text</p>
</p>
div text
</div>
BBBB <span>span txt</span>
HTML;


How to replace div on p which is inside it?

I need to write a regular expression to get the following result

<?php
$str = <<<HTML
AAAA <span>span txt</span>
<p class='unique_p'>
<span>p span text</span>
<p>p p text</p>
</p>
BBBB <span>span txt</span>
HTML;


There is only one div and p with such attributes.

Answer

Since you're looking at what appears to be HTML and given that your requirements entail some form of modification to the Document Object Model (DOM) I would suggest using a DOM parser like DOMDocument.

If I understood your question correctly, you're looking to replace the <div> node which appears to have an id attribute of xrz with the p node that has a class attribute of unique_p and is a child of the div.

  1. Getting the div is easy, because it has an id and they are unique. So we can use a method like DOMDocument::getElementById to get that div.
  2. Getting its child p gets a little trickier since we want to make sure it's both a child and has the specified class. So we'll rely on an XPath query for that using DOMXPath.
  3. Finally, we'll replace the div with its captured child p by using DOMNode::replaceChild from there.

Here's a simple example.

$str = <<<HTML
    AAAA <span>span txt</span>
    <div class='unique_div' id='xrz' data-id='1'>
        div text
        <span>span text</span>
        <p class='unique_p'>
            <span>p span text</span>
            <p>p p text</p>
        </p>
        div text
    </div>
    BBBB <span>span txt</span>
HTML;

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($str, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$children = $xpath->query('//div/p[@class="unique_p"]');
$p = $children->item(0);
$div = $dom->getElementById('xrz');
$div->parentNode->replaceChild($p, $div);
echo $dom->saveHTML();

The output should look something like this.

<p>AAAA <span>span txt</span>
    <p class="unique_p">
            <span>p span text</span>
            </p><p>
    BBBB <span>span txt</span></p></p>

In case you're wondering why the output may appear slightly different than what you might expect, it's important to note that your initial HTML, provided in your question, is actually malformed.

See section 9.3.1 of the HTML 4.01 specification

The P element represents a paragraph. It cannot contain block-level elements (including P itself).

So each time a DOM parser finds an opening p tag inside of another p tag it will just implicitly close the previous one first.