user609306 user609306 - 2 months ago 16
PHP Question

PHP Regex completely remove HTML Anchor link from string in PHP

I have a HTML string in PHP. It may have several anchor tags like this

.....<p><span>qwerty</span></p>...qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....


<a>
tag may contain several other HTML tags like
<p>,<span> <br>
etc.

I want a regex express which removes everything inside
<a>
tag including
<a>
tag i.e. remove all anchor tags along with all the data inside anchor tags

Output should be :
<p><span>qwerty</span></p>....qwerty....qwerty....qwerty....


Please note that there is no xyz in final output.

Thanks

P/s: String may contain other HTML tags which are not embedded in Anchor tags. I want to keep them. Lets say string may contain p,span,div,strong etc tags. Only a tags should be removed. I need regex.

Answer

You don't need any regex for this, just use strip_tags function to strip HTML tags from input:

$s = '.....qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....';

echo strip_tags($s);

//=> .....qwerty....xyzqwerty...xyzqwerty.....

Based on edited question: You can whitelist some tags to allow them in input:

$s = '.....<p><span>qwerty</span></p>...qwerty....<a href="www.xyz.com">xyz</a>qwerty...<a href="www.xyz.com"><p><span>xyz</span></p></a>qwerty.....';

echo strip_tags($s, '<p><span>');
//=> .....<p><span>qwerty</span></p>...qwerty....xyzqwerty...<p><span>xyz</span></p>qwerty.....

With all the pitfalls of HTML parsing using regex here is one to work with OP's:

echo preg_replace('~<a [^>]*>.*?</a>~', '', $s);
//=> .....<p><span>qwerty</span></p>...qwerty....qwerty...qwerty.....