SQL ordering

Tuesday 13 June 2006

I was discussing a new list display with a co-worker the other day, and the question of its ordering came up. “It’s got no ORDER BY clause,” I said, “so it will be randomly ordered.”

“No, it will be ordered by the primary key, the id,” he insisted.

“No, if you don’t specify an ordering, then you’re allowing MySQL to return the data in any order it finds convenient, and it will return the data to you as it finds it, and who knows what order that will be?”

The debate continued. My co-worker claimed that since the id is the primary key, that will be the order records are stored on disk, and that would certainly be the order in which they would be found.

I pointed out that there are many factors that contribute to determining the order of records on disk. For example, if the database strictly orders the records by their id, it has to be prepared to move records or create overflow blocks when a new id is inserted into the middle of an id range.

“Yes, but it’s got to be in id order,” he continued, and then in a fit of confidence, “I’ll bet you my car that if you select some records with an integer id and no ORDER BY clause, they will be returned in id order.”

I turned to my SQL prompt and typed a query off the top of my head:

mysql> select id from listitems where list_id = 4000;
| id    |
| 53448 |
| 53447 |
| 53449 |
| 53450 |
| 53451 |
| 53452 |
| 53453 |
| 53454 |
| 53455 |
| 53456 |
| 53457 |
| 53458 |
| 53459 |
| 53460 |
| 53461 |
| 53462 |
| 53463 |
| 53464 |
| 53465 |
19 rows in set (0.00 sec)

As you can see, the records are returned in id order, except for the first two, which are reversed. Why? Who knows?

There’s a lot of complexity in a relational database, and the implementers generally take every advantage they can. If you don’t specify an ordering, you will get your records in an arbitrary order. Often, when trying out code for the first time, they will seem to be returned in order, but that’s because your database is small. As your data grows, more randomness will appear as deletes and inserts become more jumbled.

As always, specific databases may make more guarantees. For example, I am told that Microsoft SQL Server always stores records in primary key order, and that you need to account for this in designing your schema to get maximum performance. I don’t know if this is true or not. I don’t know if it is true for all versions of SQL Server, or all combinations of table creation options.

This is one of those cases of confusing an implementation with a standard. SQL itself makes no guarantees about the ordering of records, and it makes no claims about what a primary key “means” other than it is a unique non-null index into the records of a table. But specific implementations (SQL Server, MySQL, SQLite, whatever) may make more specific guarantees about the meaning of these things.

But do yourself a favor: if you care about what order your data is returned from a SQL query, add an ORDER BY clause. If you think the data is naturally ordered that way, then the ORDER BY clause won’t add extra work, and if the data isn’t naturally ordered that way, adding the clause will set things right.

BTW: I didn’t take the car.


Does mysql have something like 'oid' in Postgresql? ALL tables in any postgres database has a column called "oid" but when I do something like "SELECT user_id, oid FROM some_table" none of the numbers are returned in order. I like it that way.
Perhaps mysql is ordered by mysql's equivalent of "oid".
"They'll come out ordered as I out them in" is a common misconception and probably a residual effect of experience with paradox and dbase style record files. In the mid 90's I was horrified to discover that one of our apps had been coded to rely on the order of records in a SQL database. It actually had code like :

//Go to the next customer, 3 records from here

I think it's more instructive to say that the order that records will be returned by default is unspecified rather than random. Random suggests, well, randomness and most of the time, as your example shows, the results follow a pattern.
Take the car. It'll keep him from shooting his mouth off in the future. You could use a little more "Tony Soprano" in your dealings with others.
Ian: you are right, "unspecified" is the accurate way to describe the request. "indeterminate" might be an even better word to describe the results. "Random" is sometimes precisely what is wanted, as in "shuffled", and that isn't what I was getting at here.

Andrew: there is some speculation that in fact, the car was undesirable enough that this entire episode was an attempt to unload it on an unsuspecting rube (namely, me)!
Just yesterday I was playing around with MySQL 5.1's partitioned tables and with 4 partitions the records would come out in an "indeterminate" order of 1,5,9,12,2,6,13,3,7,10,14,4,8,11,15 (seemingly ordered within partitions, all partitions are combined).

So yea, don't look for order where there isn't supposed to be one :)
MS Sql Server does not write records to disk in primary key order. It writes them on whatever single colum has a clustered index. By default, when you create a primary key it also makes that the clustered index, but it's not advisable to leave that column clustered in most cases. Even with a clustered index and records ordered on disk, MS gives the same warnings that you cannot rely on disk order (or the primary key) to determine the order of the result. I've seen plenty of cases to verify this.
You really should have taken the car.
Heh. I come across this misconception all the time. It's totally frustrating people use unordered queries in unit test cases, resulting in strange breakages a week later. I just googled and found that I was whinging about this two years ago - http://cardboard.nu/blog/2003_11_14/unit_testing_data_from_an.html. BTW, since I wrote that I have found that it helps some people if I describe the ordering as "arbitrary" or "undefined", rather than "random".
For MySQL, the on-disk order depends on the storage engine used. For MyISAM tables, if no rows are ever deleted, the on-disk order corresponds to the insert order. For InnoDB, they're always in primary key order (and unlike MyISAM, there must always be a primary key defined). For HEAP, they're ordered by some internal hash of the primary key. For the other storage engines, I've never had a reason to know.

And all of that is subject to change at any time and isn't documented so far as I know, so if you depend on order, you should use an ORDER BY.
Take the car just to see what kind of substitution he'll bring. :D I bet your kids would be thrilled with another Hot Wheels or R/C car! ...maybe he'd build one out of business cards and paperclips.
I think the beauty in this whole story is that Ned's co-worker bet his car, Ned bet nothing against it, and then only had to type in a SQL query to win the bet.
SQL itself makes no guarantees about the ordering of records, and it makes no claims about what a primary key "means" other than it is a unique non-null index into the records of a table.

You could have also handed your colleague an SQL book, not only is there no guarantee, even further this is one of the basic tenets of SQL theory, going back to Codd, right?


Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.