fabs fabs - 7 days ago 5
HTML Question

Regex: Negate capture group with logical or

I'm trying to use regex to filter forbidden HTML tags out of a given string. Yes I know, I'm supposed to use a parser instead but for this specific problem it's faster this way.

The idea is to whitelist every tag which is okay (e.g.

<span>, <b>, </br>
) and match forbidden ones. So far I came up with the following expression:
<\/?(?!(span|b|br)).\>


It works well for single char tags like
<a>
but stuff like
<label>
does not work. I'd really appreciate some help, thanks in advance.

Answer

This regex will get tags while ignoring the span, br, b opening and closing tags.

It should even ignore those from the white list if they contain attributes.

<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>