wolfgang1983 wolfgang1983 - 1 month ago 8
PHP Question

Get str_replace to avoid replacing selected tags

I have this code below which find any less than and greater than HTML Character Entities in the users question and replaces them with suitable entity name

$string = $this->input->post('question');

$find_and_replace = array(
'<' => '&lt;',
'>' => '&gt;',
);

$new_data = str_replace(array_keys($find_and_replace), array_values($find_and_replace), $string);


When there are tags like
<pre></pre>
and
<code></code>
in the question

it replaces them as well
&lt;pre&gt;&lt;/pre&gt;
and
&lt;code&gt;&lt;/code&gt;


I do not want that to happen only to replace the content inside the tags.


Question How can I still use the str_replace but only the content
inside pre tag or code tag.


public function preview() {
$data = array('success' => false, 'question' => '', 'tag' => '');

if ($_POST) {

$string = $this->input->post('question');

$find_and_replace = array(
'<' => '&lt;',
'>' => '&gt;',
);

$new_data = str_replace(array_keys($find_and_replace), array_values($find_and_replace), $string);

$data['question'] = $new_data;

$data['success'] = true;
}

$this->output
->set_content_type('application/json')
->set_output(json_encode($data));
}

function get_everything_in_tags($string, $tagname)
{
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match($pattern, $string, $matches);
return $matches[1];
}

Answer

You could use preg_replace_callback like this:

$new_data = preg_replace_callback("#</?(pre|code)>|[<>]#", function ($match) {
    return $match[0] == '<' ? '&lt;' : ($match[0] == '>' ? '&gt;' : $match[0]);
}, $string);

It will preserve the <pre> and <code> opening and closing tags and only replace the other < and > characters.

Note that in general replacement methods are not the ideal way to work with HTML. You could look into DOMDocument to parse HTML and get the text content of the elements in an HTML string.