Martin AJ Martin AJ - 3 months ago 22
HTML Question

How can I strip html tags except some of them?

I need to remove all html codes from a php string except:

<p>
<em>
<small>


You know, strip_tags() function is good, but it strips all html tags, how can I tell it remove all html except those tags above?

Jan Jan
Answer

According to your comment, you want to remove HTML elements only if they have some class or attribute. You'll need to build up a DOM then:

<?php

$data = <<<DATA
<div>
    <p>These line shall stay</p>
    <p class="myclass">Remove this one</p>
    <p><a href="#somewhere">I will be deleted as well</a></p>
    <p>But keep this</p>
</div>
DATA;

$dom = new DOMDOcument();
$dom->loadHTML("<div id='wrapper'>".$data."</div>");
$dom->removeChild($dom->doctype);


$xpath = new DOMXPath($dom);

$elements_to_be_removed = $xpath->query("//*[count(@*)>0]");
foreach ($elements_to_be_removed as $element) {
    $element->parentNode->removeChild($element);
}

// just to check
echo $dom->saveHTML($dom->getElementById('wrapper'));
?>

To change which elements shall be removed, you'll need to change the query, ie to remove all elements with the class myclass, it must read "//*[class='myclass']".