ethereal1m ethereal1m - 1 year ago 54
PHP Question

Optimum Regular Expression: Repeat Occurrences

I try to replace the following tag and its content with empty string:

<a href="http://localhost/photo/448e7d40ed468d73c5f9caba573f6273-0.png" class="wall-image-anchor" target="_blank"><img src="http://localhost/photo/448e7d40ed468d73c5f9caba573f6273-0.png" /></a>

Note that the href url inside
tag can be anything. So is the content inside
, in this case
with its content.

So far I got the following code:

$text = preg_replace('@(.*?)<(?:a\b.*?class="wall-image-anchor".*?)>.*?</a>(.*?)@si', '$1$2', $text);

This code should transform the following string:

zzzzz<a href="http://localhost/zz/photo/448e7d40ed468d73c5f9caba573f6273-0.png " class="wall-image-anchor" target="_blank"><img src="http://localhost/zz/photo/448e7d40ed468d73c5f9caba573f6273-0.png" alt="Image/photo" /></a>ffff<br /><a href="http://localhost/ada/photo/448e7d40ed468d73c5f9caba573f6273-0.png " class="wall-image-anchor" target="_blank"><img src="http://localhost/ada/photo/448e7d40ed468d73c5f9caba573f6273-0.png" alt="Image/photo" /></a>ffffgg ffff<br /><a href="http://localhost/dad/photo/448e7d40ed468d73c5f9caba573f6273-0.png " class="wall-image-anchor" target="_blank"><img src="http://localhost/dad/photo/448e7d40ed468d73c5f9caba573f6273-0.png" alt="Image/photo" /></a>ffffgg'


ffffgg ffff

This code works. My question is: is there any other way to make it faster?


Answer Source

The first issue here is correctness. As written, your regex will match starting at the beginning of the first <a> tag, no matter what its class attribute is. (demo) You need to replace the internal .*?s with something that can't match beyond the tag boundaries, i.e., [>]*.

That will also cut down enormously on the amount of backtracking, improving performance considerably. The other thing you should do is get rid of the (.*?) at either end. Anything not matched by the regex is unaffected by the replace operation, so you're just making it do unnecessary work.

Here's how it should look:



I assume you can handle replacing the <br /> tags.