kamelkev kamelkev - 1 month ago 7
HTML Question

javascript generating invalid HTML5 attributes in Firefox

I am noticing some very strange behavior in firefox and I'm wondering if anyone has a strategy for how to normalize or work around this behavior.

Specifically if you provide firefox a basic anchor containing html entities it will unescape those entities, fail to re-escape them and hand you back invalid html.

For example firefox mishandles the following url:

<a href="&gt;&lt;&quot;">My Original Link</a>


If this url is parsed by firefox it will unescape the
&gt;&lt;&quot;
and start handling a url like:
<a href="<>"">My Original Link</a>


This same operation appears to work fine elsewhere, even safari and edge.

I tried quite a few different ways of handing the html to firefox to avoid this problem. Tried manually invoking the parser, tried setting innerHTML, tried jQuery html(), tried giving jQuery constructor a giant string, etc. All methods produced the same broken result.

See a fiddle here:
https://jsfiddle.net/kamelkev/hfd2b6sn/

I am a little mystified by how broken this handling seems to be. There must be a way to work around this issue, but I can't seem to find a way.

My application is an html manipulation tool, so I typically normalize around issues like this by dropping down to XML and handling the problems there before persisting to a dumb key-value store, but in this particular case the
<>
characters are preventing me from processing this document as XML.

Ideas?

Answer

A < or a > is valid inside of an attribute value, unescaped. It's not best practice, but it is valid.

What's happening is that Firefox is parsing the original HTML and making elements out of it. At that point, the original HTML no longer exists. When you call .outerHTML, the HTML is reconstructed from the element.

Firefox then generates it using a different set of rules than Chrome does.

It isn't clear what exactly you need to do this for... really you should edit the DOM and export the HTML for the whole DOM when done. Constantly re-interpreting HTML isn't necessary.

Comments