lazyboy78 lazyboy78 - 1 year ago 72
How to escape code in <code> tag with php htmlentities, even if tag has attributes

So someone actually posted a fantastic solution here How can I escape all code within <code></code> tags to allow people to post code?

The problem is that this only works if it's

. However, this breaks with
<code id="lol"></code
for example, since it contains an attribute. How can I account for this, in order to strictly escape strings inside the code tag, whether or not it has any attributes.

I apologize if there is an obvious solution to this. Regexes give me nightmares.


As I explained in the question initially, the post that is supposedly a duplicate does not account for the
tag with something like a class or any other attributes.

In spite of my comment above, I'll endeavour to provide a regex for you to use. I would, however, emphatically not recommend doing this with regexes, but using an HTML parser instead.

Your regex should look a bit like this:


To break it down a bit,

\s* matches zero or more whitespace characters.

code matches the literal string "code".

.*? is a lazy match of zero or more characters. It will match everything (if anything) up to the end of the tag.

(.+?) is a capture group, containing a lazy match of one or more characters. If nothing else, your <code> tags will never be completely blank (as there must be at least one character between them).

And, finally, <\s*\/code\s*> matches the ending tag, with the possibility of included whitespace. Note that the slash (/) character is escaped, as it has to be in just about every regex flavour under the sun.

