fabs fabs - 5 months ago 40
HTML Question

Regex: Negate capture group with logical or

I'm trying to use regex to filter forbidden HTML tags out of a given string. Yes I know, I'm supposed to use a parser instead but for this specific problem it's faster this way.

The idea is to whitelist every tag which is okay (e.g.

<span>, <b>, </br>
) and match forbidden ones. So far I came up with the following expression:

It works well for single char tags like
but stuff like
does not work. I'd really appreciate some help, thanks in advance.


This regex will get tags while ignoring the span, br, b opening and closing tags.

It should even ignore those from the white list if they contain attributes.

<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>