Ben Sinclair Ben Sinclair - 4 months ago 13
PHP Question

Regex to match placeholders that contain HTML within them

I have placeholders that users can insert into a WYSIWYG editor (which contains HTML code). Sometimes when they paste from apps like Word etc it injects HTML within them.

Eg: It pastes

%<span>firstname</span>%
instead of
%firstname%
.

Here is an example of my regex code:

$html = '

<p>%firstname%</p>

<p>%<span>firstname</span>%</p>

<p>%<span class="blah">firstname</span>%</p>

<p>%<span><span>firstname</span></span>%</p>

<p>%<span><span><span>firstname</span></span></span>%</p>

<p>%<span class="blah"><span>firstname</span></span>%</p>

<div>other random <strong>HTML</strong> that needs to be preserved.</div>

';

preg_match_all(
'/\%(?![0-9])((?:<[^<]+?>)?[a-zA-z0-9_-]+(?:[\s]?<[^<]+?>)?)\%/U',
$html,
$matches
);

echo '<pre>';
print_r($matches);
echo '</pre>';


Which outputs the following:

Array
(
[0] => Array
(
[0] => %firstname%
[1] => %firstname%
[2] => %firstname%
)

[1] => Array
(
[0] => firstname
[1] => firstname
[2] => firstname
)

)


As soon as there is more than one span inside the placeholder it doesn't work. I'm not quite sure what to adjust in my regex.

/\%(?![0-9])((?:<[^<]+?>)?[a-zA-z0-9_-]+(?:[\s]?<[^<]+?>)?)\%/U


How would I achieve this?

Answer

Try this Regex. It should help you out!

/\%(?![0-9])(?:<[^<]+?>)*([a-zA-z0-9_-]+)(?:[\s]?<\/[^<]+?>)*\%/U