Sunday 29 April 2007 — This is over 17 years old. Be careful.
Jeff Atwood, in JavaScript and HTML: Forgiveness by Default writes about how a design decision in XML doomed XHTML:
Unfortunately, the Draconians won: when rendering as strict XHTML, any error in your page results in a page that not only doesn’t render, but also presents a nasty error message to users.
They may not have realized it at the time, but the Draconians inadvertently destroyed the future of XHTML with this single, irrevocable decision.
The lesson here, it seems to me, is that forgiveness by default is absolutely required for the kind of large-scale, worldwide adoption that the web enjoys.
Getting upset now about the draconian error handling of XML seems kind of quaint.
At this point, I think it is clear that XML’s strictness about well-formedness is very easy to satisfy. It is easy to write automatic producers of XML that do it correctly, and hand-edited XML is also easy to fix when it has missing angle brackets or mismatched tags.
The main problem with XHTML has nothing to do with XML’s strictness. The problem is that it’s XML masquerading as HTML. HTML has different lexical rules than XML. Writing a single document that is both valid XHTML and an acceptable HTML document that will be understood by legacy browsers is very difficult, if not impossible. It’s essentially a polyglot programming exercise, where one file can be interpreted correctly according to two different sets of rules. Except that we all kidded ourselves into thinking it wasn’t, because HTML and XML both use tags.
HTML is derived from SGML, which has a dizzying array of shortcuts to minimize
the markup in a document. Take a quick look at Tag Minimization
from Martin Bryan’s book to see the kind of stuff SGML lets you do. Some of
this is still in HTML, which is why XML’s <br/>
doesn’t do what you
think in an HTML document.
Other issues include the special treatment browsers give to script content, where less-thans really are less-thans, while in XML, they have to be escaped as <. A fuller run-down of the problems is in Ian Hickson’s Sending XHTML as text/html Considered Harmful.
So to my mind, the problem here is not that XML is strict, but that it is different from HTML. You can’t easily write a page which works as both. Jeff gives the example of an author publishing a page and then finding out from his horde of angry readers that the page won’t display. This is not the kind of problem that happens: well-formedness is easy to check and fix.
That said, it’s also true that being strict about well-formedness does nothing to help with checking validity, and beyond that, nothing to help with checking for correct rendition. It’s that last level of correctness that is the hobgoblin of web development: once the tag stream is correct according to some criteria, the browser must then draw a page, and there is where things really run off the rails.
Certainly invalid pages will have more rendering problems that valid pages, but validity is not enough to guarantee that the page will look correct. So XML’s strictness is easy to achieve, and also fairly useless. In the end, Jeff is right:
Even though programmers have learned to like draconian strictness, forgiveness by default is what works. It’s here to stay. We should learn to love our beautiful soup instead.
Comments
Is there another link I can use?
I guess I should think about how to provide individual entries with unique URLs...
Great response. One caveat though:
> Jeff gives the example of an author publishing a page and then finding out from his horde of angry readers that the page won't display.
Actually, the example is a bit more specific than that: most web pages these days use content from other sites in some form, so they're also banking on the fact that these external sources won't accidentally introduce some malformed markup into their own page. So if you're using draconian error handling, you better pray every bit of markup in your page, from whatever source you're getting it from, is 100% compliant, forever.
http://diveintomark.org/archives/2004/01/14/thought_experiment
In fact no matter how many mistakes they made, they still got something back, and no scary intimidating error messages. There are plenty of professional programmers today who started battering together bad HTML in notepad, and then taught themselves a little PHP and several years later are turning out respectable code. For the amateur, with the most primitive tools, Strictness is not easy.
Why did the W3C try and move everyone up to a well-formed markup language? The author should try reading up on HTML parsing and having to write tools and browsers for all the sloppy output produced by Web authors who seem to think that closing their tags is just too much work (and yet seem to have the inclination to dive into one of the most inconsistently implemented programming languages ever produced and tell us all about it).
Reliable systems require parts that behave consistently and interact coherently, not some bag of "do what I mean" bricks where the end result needs a "quirks mode" guide because a bunch of people couldn't face seeing error messages.
It *might* have caught on, but it certainly wouldn't have been the open forum for explosive innovation that it was for anyone with a text editor.
Add a comment: