Kevin Whitaker Kevin Whitaker - 7 months ago 24
HTML Question

Javascript: Escape <> for non-html strings, but preserve html

I have some text that contains HTML (to be rendered in the browser), as well as arbitrary strings with

. Is there a way to escape those arbitrary tags, but preserve the HTML? If it helps, the HTML being parsed is very strictly governed, and only a subset of tags is allowed (

For example. Given this text:

<strong>Foobar</strong> <some other whatever>

I need

<strong>Foobar</strong> &lt;some other whatever&gt;


A cheap option would be to replace <> with placeholders, and then restore them in "good" contexts:

allowedTags = ['strong', 'em', 'p'];

text = '<strong>Foobar</strong> <some other whatever> <b>??</b> <em>hey</em>'

text = text
  .replace(/</g, '\x01')
  .replace(/>/g, '\x02')
  .replace(new RegExp('\x01(/?)(' + allowedTags.join('|') + ')\x02', 'g'), "<$1$2>")
  .replace(/\x01/g, '&lt;')
  .replace(/\x02/g, '&gt;')


A not-so-cheap, but more correct solution is to use an (event driven) html parser and escape unwanted stuff as you go.