XML Schema for non-XML data?

Wednesday 17 November 2004

I am a software developer working in the opening decade of the 21st century. This means that many of the solutions I devise will involve XML. Of course, choosing XML is only part of an answer: the exact form of the XML must be chosen as well. As with many software problems, I can use an existing solution, or I can make up something new.

The latest problem at hand involves describing the schema of a relational database in XML. The proposed existing solution is XML Schema. The question: is it better to use XML Schema to describe a database, or to use a new XML dialect custom-tailored to the job?

In brief, the upside of using XML Schema is that it already exists, has been thoroughly thought-out, and may allow us to use existing tools to work with our database description. The downside of XML Schema is that it wasn’t designed to describe databases, and it is complex.

Pros of using XML Schema:

  • Things like data types, including domain restrictions, have been amply provided for in XML Schema.
  • We wouldn’t have to teach something new to other developers, they could make use of their existing understanding of XML Schema.
  • There may be existing tools to use with out Schema definitions.
  • It’s a good public relations point to say you use XML Schema.
  • I could stop spending time defending my decision not to use XML Schema and get on with my life.

Cons of using XML Schema:

  • XML Schema was only designed to describe XML documents. Anywhere a relational database doesn’t act like an XML document, I’ll have to stretch Schema to fit, or go ahead and make up something new anyway. For example, how are primary keys, indexes, or views described in XML Schema? And let’s not discuss triggers and stored procedures.
  • Because Schema is fundamentally about XML documents, even the best fitting concepts will involve a mental translation layer. “Tables” have to described as “elements”, uniqueness constraints involve XPath, and so on.
  • Although Schema is very powerful, in some ways it’s more than I need. The problem at hand really is to just describe a relational database, not a larger data model which will be stored in a database. So the semantics (and much of the syntax) are simple and very well understood.
  • Schema can be verbose, complicating the code I’ll have to write to use it.

Other thoughts:

  • If I don’t use XML Schema, I could still generate a Schema definition for the parts that make sense, and get some of the benefits of Schema anyway.

The summary is: I don’t want to use XML Schema to describe a relational database. Have I missed something? Am I making a mistake?



I probably am showing my ignorance, and I wasn't going to respond at all (but here I am), but looking at the w3c spec, my initial reaction, was that it was much too document centric, without any of the mechanisms to handle documents of differing types (i.e. unstructured data - like in lotus notes). I know this isn't what you asked, but given that the binary parts of the xml schema are fixed ("finite-length") size, it makes blobs, stored procedures and cursors hard to define. I guess this is nice at the document level, but seems to leave out a lot of the traditional relational database tools out of the specification definition.

It also seems quite involved to specify the schema when compared to say, SQL. SQL might not be perfect, but it handles a lot of what you were talking about. And since there will be relational databases with sql in them for quite some time, why not just use it to define your database?

Or am I missing the point?

Using SQL to describe the structure of the database of course makes a lot of sense (it's what it was designed for). In my case, I wanted something easier to parse than SQL, and in fact, my use of the database is simple enough that SQL is itself overkill for the job.
I've been looking for something similar for a while. In fact I was looking for a pre-defined XML Schema (or DTD, or Relax NG specification) for defining a relational database for one of my applications (http://www.halfcooked.com/code/gerald)

I found nothing and ended up producing my own form of XML without a validating schema or DTD. I figured that if I hung around long enough something would come up and I could adapt my XML readers and writers to use it. About the closest (and scariest) thing that I could find were the OMG's Meta-Object Facility (MOF) and Common Warehouse Meta-model (CWM) but they are so abstract as to be almost useless at a practical level.

On a purely superficial level when I looked at DTDs, XML Schema and Relax NG it was Relax NG that looked the simplest. YMMV of course.
I use Microsoft's .NET framework at work. Microsoft uses XSDs to describe table relationships for use with their DataSet object. You could look at documentation for this on Microsoft's website to get an idea of how they are implementing this. Drop me a line if you have any questions.
Hi Ned,

It depends on what relational database that you are going to describe in XML. Most of the major relational DBs have a predetermined XML format for describing data. They also have some kind of layer that you can use to read and write XML to relational tables as well. Non-microsoft DBs use SQL/XML to do this (http://www.sqlx.org/) and MS uses the very similar but incompatible SQLXML (http://www.sqlxml.org/).

I would use the XML schema format of whatever relational DB you are using.

And don’t hurt yourself hand-coding the schema(s). There are several tools out there for that. The best tools are made by Altova - XMLSpy and MapForce will generate the schema for you if it can connect to the database. (http://www.altova.com/features_database_mapforce.html). Not cheap, but they do have a 30 day trial, which I highly recommend.

I have a whole section of my book, the XML Programming Bible, that outlines strategies for what you’re trying to do, with specific examples for SQL Server, Oracle and DB2…..

Hope that helps…..
Brian, I don't think what you are proposing is what I (or Ned) are looking for.

I at least don't want to store XML documents in my database, I know that there are apis (such as SQL/XML) to do this and can easily look them up in product documentation. Serialising XML documents is not what I'm after.

What I want, and from reading his post Ned does as well, is a way of describing a relational database schema in an XML document. For instance, what are the attributes or child elements of a table element? How do we represent that a certain column is of the datatype 'DATE'?
OK, I thought I answered that but I'll clarify:

-There is no universal format for describing relational data. -Each vendor has their own method of describing data as XML.
-All the vendors I know of support XML Schema to do this. -There are vendor and third-party tools you can download that generate the schemas for you.

As for the XML query languages:
-You can use them to query and retrieve data using XML documents.
-You do not have to store the data using XML.
-The XML queries use an XML schema to translate queries and responses.

Hope that helps.
And as for describing data types, The Schema working group actually thought of this.....

OK, once more, then I'll leave it alone. I think we are getting confused on the multiple uses of the word 'schema'.

I do not wish to describe or interact with the data in my database in an XML form or using XML query languages.

I wish to describe the structure (database schema) of my relational database (typically tables, columns and indexes) in an XML document. I would like a formal grammar (XML schema or equivalent) to define the structure of that document.

Thanks for the pointer on data types though, looks interesting
Ok, just one more for me too :).

1) I do get what you mean. I wrote a book on this topic....

2) Each vendor has their own formal grammar for describing relational databases, which sounds like what you're looking for. There is no XML Schema grammar which will represent all types of relational databases, objects, etc. The only formal level is XML Data types. After that, each vendor made stuff up for themselves to represent rows, columns, views, joins, etc..

3) I think the best thing for you to do is to download the trial version of MapForce or XMLSpy (see the link above) and work with the "generate XML Schema" features. Once you see what these tools generate things should fall into place.
Why not combine? Your arguments against using xsd to describe a relational database seem compelling to me: a document is not a database.

But XML is flexible enough for you to invent an ML that does describe a relational database. And then you can create an XSD that validates that ML, no?

Then you solve a bunch of problems all at once, including the political one: You can truthfully say that your solution makes use of XML Schema.
Funny I'm looking for the same thing. I currently use an access database to enter the 'meta data' about my databases. I then run a python script to convert this information into python module for use within various projects with SQLDict.

I'm doing so much with xml these days that I want to move away from Access and use XML to describe my database schema.

I've been googling for days. This is the closest url I've found that also describes what you want:


Though interestingly he's got the same problem, but no xml-schema to describe RDB schemas.

I'm just now playing with Druid, trying to figure out why it screws up reverse engineering my existing firebird databases. Druid's xml output is pretty simple.

IBM's XTables project from 2002 sounded like it also had a xml schema to describe RDB schemas, and could update it on the fly. But that project has morphed into something else big and expensive.

I guess I will use Altova XML Spy (Home version) to create an xsd. What have you done in the past 4 months?

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.