Inventing XML Languages

Tuesday 10 January 2006

Tim Bray has two new posts: Don’t Invent XML Languages, and if you must, On XML Language Design. I have a couple of quick thoughts about them.

Tim is clearly coming at this from a document-centric (rather than data-centric) point of view, and from an interoperability rather than convenience point of view. Let me explain.

There are two broad camps of XML languages. Document-centric languages are for describing rich content, with flexible structure. They are often arbitrarily hierarchical, and can have nested namespaces for mixing applications together. Data-centric languages are for more structured information. There’s no clear distinction between the two styles, and there are middling languages that have some flavor from both camps, but these are the two ends of the spectrum.

There are also, broadly, two reasons for using XML. The first is interoperability: you want a data description that can be used for many things, by many different systems. The second is convenience: everyone’s already got an XML parser, so why not use XML to describe data, it’ll be easy to parse, and people will know what they’re looking at when they encounter it.

The two dimensions are closely coupled: document-centric languages tend to focus on interoperability, and XML chosen for convenience tends to be about structured data. XML’s design center was interoperability of document-centric languages, and Tim is coming from that point of view. But many many XML applications are not. The public, highly visible ones are (XHTML and RSS for example), but the base of the iceberg is more about convenience parsing of structured data. If you are considering using XML, first understand whether yours is an Interoperability solution or a Convenience solution. If yours is Interoperability, then take Tim’s words to heart. If yours is Convenience, then don’t worry about it.

But another thing about Tim’s screed against inventing XML languages is that it’s very pessimistic:

Robin Cover [has] assembled a helpful list of known XML languages, which currently has about 600 members. [...] Looking at the list, I have a question: How many of them matter?

Have a look at the list; what do you think? I think it’s a lot less than 600. Let’s rephrase that question: How many achieved their designers’ objectives? Same answer, I think.

The conclusion is obvious; if you embark on designing a new XML language, there’s a substantial probability that your effort will not be rewarded with success.

So right there is a good reason not to embark on this kind of thing: it’s really hard, really time-consuming, and there’s an excellent chance that it won’t produce the results you were hoping for.

Tim is right about all this of course, but his conclusion (“Don’t Invent XML Languages”) is I think an overreaction to the difficulty. After all, you could global search and replace the concept “XML language” in Tim’s paragraphs with “software” or “startup” or “screenplay” or “sculpture”, and come to the conclusion that no one should undertake those endeavors.

Yes, it’s hard, yes, it’s undertaken too lightly by many who have no idea what they are getting into. “Don’t Invent XML Languages” is writerly bravado meant to grab the reader’s attention. Don’t take it too literally.

One last thing: PDF is not an XML language, and I think Tim may have implied that it was.

Comments

[gravatar]
Dominic Cronin 4:14 AM on 11 Jan 2006

Surely the correct formulation for this kind of 'writerly bravado' would be 'XML Languages considered harmful'.

[gravatar]
Bob Balfe 8:24 AM on 11 Jan 2006

I would not take it literally but I would take the overall comment serious. I have seen so many internal tools created that define entire languages and simply wonder "why?". Now it also seems that people invent new XML schemas to do something that also already exists. Like a window definition or GUI layout! How many do we need?

[gravatar]
Mark Mascolino 10:01 PM on 11 Jan 2006

I also think it matters how exposed this xml file is to the outside world. If it is just a configuration file, then you can treat it as an implementation detail and not worry too much about it. However, the more this data file is exposed to "things" outside the scope of the application then I think more and more of Tim's sage advice applies.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.