1owk3y 1owk3y - 25 days ago 9
Javascript Question

Parsing unicode in unescaped XML

I'm trying to parse some poorly formatted XML.

I say poorly formatted - because everyone knows that you're not supposed to have un-escaped ampersands in an XML file.

Problem is, I need to collect some unicode formatted phrases from an XML file. I need the format to be as close to the original as possible. You can replicate this issue in your console log...

console.log($("<test>&#xE2;</test>").text())
// Outputs 'â' instead of desired '&#xE2;'


I've tried every combination of
escape
,
unescape()
,
encodeURI()
,
decodeURI()
I can fathom.

I've tried both settings for jQuery's
ajax({processData: bool})
flag. All answers I've found point to these solutions - and it seems like none of them work...

How can I modify the above code to output the original XML content?

Answer Source

Use new Option(yourUnescapedXml).innerHTML. So to answer your question directly,

console.log($(`<test>${new Option('&#xE2;').innerHTML}</test>`).text())

This creates an HTMLOptionElement, then immediately gets its (escaped) innerHtml.