Kevin Whitaker Kevin Whitaker - 3 months ago 8
HTML Question

Javascript: Escape <> for non-html strings, but preserve html

I have some text that contains HTML (to be rendered in the browser), as well as arbitrary strings with

<>
. Is there a way to escape those arbitrary tags, but preserve the HTML? If it helps, the HTML being parsed is very strictly governed, and only a subset of tags is allowed (
b
,
i
,
strong
,
br
)

For example. Given this text:

<strong>Foobar</strong> <some other whatever>


I need

<strong>Foobar</strong> &lt;some other whatever&gt;

Answer

A cheap option would be to replace <> with placeholders, and then restore them in "good" contexts:

allowedTags = ['strong', 'em', 'p'];

text = '<strong>Foobar</strong> <some other whatever> <b>??</b> <em>hey</em>'

text = text
  .replace(/</g, '\x01')
  .replace(/>/g, '\x02')
  .replace(new RegExp('\x01(/?)(' + allowedTags.join('|') + ')\x02', 'g'), "<$1$2>")
  .replace(/\x01/g, '&lt;')
  .replace(/\x02/g, '&gt;')

console.log(text)

A not-so-cheap, but more correct solution is to use an (event driven) html parser and escape unwanted stuff as you go.

Comments