legokangpalla legokangpalla - 1 month ago 15
HTML Question

XHTML-Isn't HTML already in XML format?

I'm an application developer at my company and really didn't pay much attention to HTML until recently, teaching myself bits of JavaScript and HTML during breaks.

One thing that keeps on coming up is the difference between HTML and XHTML, especially on articles regarding HTML5.

The concept that is confusing to me is that I thought HTML was already in XML format. So when someone starts saying XHTML is HTML with XML, this confuses me.

There are few difference between HTML and XHTML as listed in http://www.w3schools.com/html/html_xhtml.asp


  • Add an XHTML
    <!DOCTYPE>
    to the first line of every page

  • Add an xmlns attribute to the html element of every page

  • Change all element names to lowercase

  • Close all empty elements

  • Change all attribute names to lowercase

  • Quote all attribute values



But apart from the first and second point, others are just more strict syntax rules that can also be applied to HTML. Also I think it's possible to use XML tools on HTML document as long as it follows the rest of the points.

So what's the point differentiating two different standards?

Answer

tl;dr:
No, HTML is not already in XML format.

Longer answer:

HTML follows different rules than XML. Sure, on the whole XML's rules are bit stricter, but that's besides the point. The point is that you can have documents that are valid HTML but have little to do with XML. Example:

<title>?</title>
<p>Hello

This one doesn't even have a root element. That is, it does, but with invisible start and end tags. Or, a line like this

<script src="script.js"/>

is a no-no in HTML. Devastating results! But it is well-formed XML, and you can do this in XHTML.
So no, HTML is not almost XHTML.

I made a page with the differences between HMTL and XHTML a while back, that is more complete than the W3Schools one, here. If you want to study the differences, use that one.

For instance, the list of bullet points on W3Schools is a start, but it's by no means complete. You also need to

  • make all start and end tags visible, even if they are optional according to the HTML standards
  • remember that <script> and <style> blocks are parsed with the same parser rather than a text-only parser. If you have something that looks like HTML in a style block, for example p:after {content:'</style>';}, that goes horribly wrong! Ditto with unescaped & signs
  • realise that structures like <table><tr><td></td></tr></table> result in different DOM trees in HTML and XHTML
  • don't use <a> elements with name attributes for anchors
  • etc.

Oh, and as to W3Schools' first bullet point, the DOCTYPE declaration is not needed for browsers to display documents in standards mode. On a real XHTML document, you can leave it out and there will be no difference in display.
The only time you will need one is when you have named entities in your document, such as &eacute; which do require the full DOCTYPE in order to work. (A complete DOCTYPE including the DTD part that is, not the short HTML5 one.)