Martin Martin - 14 days ago 5
CSS Question

Does the HTML5 spec say to ignore CSS inside HTML comments?

Can someone tell me what the following paragraph in the HTML5 spec means? Regarding the processing of

<style>
element content:

https://www.w3.org/TR/html5/document-metadata.html#the-style-element


All descendant elements must be processed, according to their
semantics, before the style element itself is evaluated. For styling
languages that consist of pure text (as opposed to XML), user agents
must evaluate style elements by passing the concatenation of the
contents of all the Text nodes that are children of the style element
(not any other nodes such as comments or elements), in tree order, to
the style system. For XML-based styling languages, user agents must
pass all the child nodes of the style element to the style system.


To me this sounds like the HTML parser should remove all HTML elements and comments inside the
<style>
element before sending the resulting text to the style system.

The content within an HTML comment is also a Text node, but it is not a direct child of the style element so should not be included in the text sent to the style system.

Modern browsers don't seem to do any processing of comments or elements inside style elements, instead treating the style content as CDATA consistent with HTML 4. But this paragraph in the HTML5 spec says this is incorrect behaviour doesn't it? If not what am I missing?

Answer

The only way to get a comment node or element node into a style element is by DOM manipulation—putting the comment or element into the style element in the DOM after an HTML parser has already parsed the document.

So the spec is not saying the HTML parser should remove all HTML elements and comments inside <style>…</style> markup. If the spec intended that it would state it explicitly.

HTML parsers parse all content of in <style>…</style> markup as text—including any content that looks like a comment or looks like an element.

So there are no comments or elements for an HTML parser to remove there—it’s all just text.

Where in the spec does it say that the content is pure text?

html.spec.whatwg.org/multipage/syntax.html#raw-text-elements says style content is “raw text”.

The HTML 4 spec states clearly that the content of style elements is CDATA. That is what I am looking for but I can't find it in the HTML5 spec.

What the current HTML spec calls “raw text” is essentially the same as CDATA in the HTML4 spec.

Where does it say that it is terminated by the string "</style"?

See these steps of the parsing algorithm:

  1. https://html.spec.whatwg.org/multipage/syntax.html#rawtext-state
  2. https://html.spec.whatwg.org/multipage/syntax.html#rawtext-less-than-sign-state
  3. https://html.spec.whatwg.org/multipage/syntax.html#rawtext-end-tag-open-state
  4. https://html.spec.whatwg.org/multipage/syntax.html#rawtext-end-tag-name-state

The last step there references the definition of “appropriate end tag token”:

An appropriate end tag token is an end tag token whose tag name matches the tag name of the last start tag to have been emitted from this tokenizer, if any.

So when parsing the raw text of script contents, the last start tag to have been emitted is a <script> start tag, thus the “appropriate end tag token“ is </script>.