HTML5 has several goals which differentiate it from HTML4.
The primary one is consistent, defined error handling. As you know, HTML purposely supports 'tag soup', or the ability to write malformed code and have it corrected into a valid document. The problem is that the rules for doing this aren't written down anywhere. When a new browser vendor wants to enter the market, they just have to test malformed documents in various browsers (especially IE) and reverse-engineer their error handling. If they don't, then many pages won't display correctly (estimates place roughly 90% of pages on the net as being at least somewhat malformed).
So, HTML5 is attempting to discover and codify this error handling, so that browser developers can all standardize and greatly reduce the time and money required to display things consistently. As well, long in the future after HTML has died as a document format, historians may still want to read our documents, and having a completely defined parsing algorithm will greatly aid this.
There are many other smaller efforts taking place in HTML5, such as better-defined semantic roles for existing elements (
<em> now actually mean something different, and even
<i> have vague semantics that should work well when parsing legacy documents) and adding new elements with useful semantics -
<nav> should replace the majority of
<div>s used on a web page, making your pages a bit more semantic, but more importantly, easier to read. No more painful scanning to see just what that random
</div> is closing - instead you'll have an obvious
</article>, making the structure of your document much more intuitive.