DaMaxContent DaMaxContent - 7 months ago 34
Javascript Question

syntax highlighting with regex?

So, I ran into what I think is a catch 22.

I am attempting to build a HTML/JS syntax highlighter for markup languages. I will take a said

markup_string
(represents raw markup that I want to syntax-highlight) and run it through some JS to then represent said
markup_string
in an HTML document with CSS styling. I will do this by wrapping elements in my
markup_string
in HTML
<span>
tags. However, if I were to just dump the generated HTML as is into a div tag using innerHTML, I would then render the said
markup_string
as markup too.

I fix this by replacing all
<
with
&lt;
and
>
with
&gt;
in my
markup_string
.

done with this code:

var markup_string = markup_string.replace(/</g,"&lt;").replace(/>/g,"&gt;")


However, I, now, have a slightly less impactful problem.

what if I want to syntax-highlight HTML escape characters that exist in my
markup_string
before the fix?

This becomes a problem when I want to syntax-highlight a
&lt;
that exists in the original
markup_string
.

However, someone obviously came up with a fix, because code editors like jsfiddle have some how miraculously found a workaround.

Answer

Short answer: escape your &s and when in doubt, look at some source! From highlight.js, (https://github.com/isagalaev/highlight.js/blob/master/src/highlight.js#L33)

function escape(value) {
  return value.replace(/&/gm, '&amp;').replace(/</gm, '&lt;').replace(/>/gm, '&gt;');
}