Why numbering should start at zero

Thursday 10 September 2009

My son Max is taking a computer class in high school, and described a class exercise involving a deck of cards numbered 0 to 51. I asked him if there was any discussion in class about why it wasn't 1 to 52. He said the teacher told them that if they learned C++, they'd understand why it started from zero.

I guess the teacher meant pointer arithmetic makes it obvious, but I remembered a more scholarly exposition: Edsger Dijkstra wrote a brief paper called Why Numbering Should Start at Zero. He lays out a mathematical explanation that has nothing directly to do with pointers.

I'm impressed by Dijkstra because of his methodical approach to even the seemingly most trivial detail of computing. He was willing to stop, think through, and explain why something should be done a certain way, no matter how small.

Earlier today over lunch, we discussed the amazing foresight of the AT&T engineers who laid out the North American area codes. We marvelled at the care they took with designing a solution to a problem that wouldn't hit for a decade or so. How often do we see that kind of care and attention to detail in day-to-day work?

Both examples made me stop and admire true professionals, engineers building carefully, making sure to get it right no matter how far off the consequences. Thanks, guys. We should each try to channel your spirit in our own work.


Alistair 11:58 PM on 10 Sep 2009

I was only talking about the same sort of thing with someone at work recently regarding the foresight of some of the early Google engineers to recognise issues with, say data growth & management - which led them into creating a system which would facilitate that going forward.

You're absolutely right though, some things in everyday life which are taken for granted actually took someone to really sit down and work through a problem to get it right. Others people don't even see, Google's GFS as an example and simply use the product without any care of understanding about just how incredible the GFS service is.

Giacomo 1:55 AM on 11 Sep 2009

On the other hand, "premature optimization is the root of all evil"...

Ned Batchelder 5:30 AM on 11 Sep 2009

@Giacomo: you are right, and maybe this is where some of the hard engineering is: knowing the difference between premature optimization and the important design properties to get right up front, even if you won't have to worry about it for ten years.

andrew 5:40 AM on 11 Sep 2009

good use of "pernicious"

Juho Vepsäläinen 6:37 AM on 11 Sep 2009

Nice find!

I might be inclined to write

if n in range(20, 26): # verbose
rather than
if 20 < = n < 26: # mathematical. see [1]
in Python.

Using range might give some tiny performance penalty compared to latter. Hence if it's performance you are after you might write a code generator to replace it with latter. But if you are after performance you won't be writing this kind of code in Python anyway. :)

[1] Writing < and = together (no whitespace) seems to break your preview and possibly the post text (not tested) by cutting it right there.

Doug L. 10:40 AM on 11 Sep 2009

Tangentially related: "Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration." -- Stan Kelly-Bootle

Ed Davies 11:45 AM on 11 Sep 2009

Well, pixel numbering should start at 0.5.

Jeff Kaufman 1:28 PM on 11 Sep 2009

When you actually look at the paper, the reasoning he gives for numbering starting at 0 is:

 "when dealing with a sequence of length N, the elements of which we
  wish to distinguish by subscript, the next vexing question is what
  subscript value to assign to its starting element.  Adhering to
  convenion a) yields, when starting with subscript 1, the subscript
  range 1 <= i < N+1 ; starting with 0, however, gives the
  nicer range 0 <= i < N.  So let us let our ordinals start
  at 0 : an element's ordinal (subscript) equals the number of
  elements preceding it in the sequence.  And the moral of the
  story is that we had better regard -- after all these centuries!
  -- zero as a most natural number"
What this actually comes down to, after stripping all the mathematical language away, is a claim that "0 <= i < N" is nicer than "1 <= i < N+1". I happen to agree, but I don't think this should convince anyone who already prefers to start at 1. To me the "uglyness" of starting at 0 is small (I'm used to it) while to someone else the uglyness of ending with N+1 outweighed by the simplicity of starting at 1.

Ben Karel 3:52 PM on 11 Sep 2009

As others have mentioned, Dijkstra's reasoning is valid, but I'm not sure it's sound.

One consideration he doesn't address is the significant cultural preference, if you will, towards naming indices by the positive integers. Even as a C++ programmer long past the stage of uncomfortableness with 0-based indexing, in my head I still say "the first element," "the second item," "the third thing," and so on.

There is also the issue of exactly what operations on intervals should be optimized. The interval [A, B) makes it "easy" to determine the length of the interval and the first element but not as easy to determine the maximum element of the interval. (A, B] makes it easy to determine the length and the maximum element, but not the minimum element. And [A, B] makes the min and max easy, but not the length. Finally, the half-open intervals are easier to add; another point in their favor.

Even if we grant his having established the mathematical superiority of closed/open intervals starting at 0, the real question then becomes: does the mathematical attractiveness outweigh the impedance mismatch with entrenched habits?

It would be nice and simple to categorically say "yes," but it's not clear that that's true. Off-by-one pointer errors in C due to the above-mentioned disconnect, for example, have had measurable economic impact.

Charles Monk 4:45 PM on 11 Sep 2009

I agree with Jeff Kaufmann. The argument seems to hinge on "Nicer", which is simply introduced without any justification.

Dijkstra uses the 0..n-1 for his page numbering in this note. Dogma has apparently triumphed over elegance and convention here. Wouldn't this be even more confusing if the total number of pages were also displayed? The final page would be labelled "Page 2 of 3". Now that would cause some real problems in an examination!

Kerry 3:34 PM on 13 Sep 2009

So why not use 1 <= i <= N ?

Jeff Kaufman 6:11 PM on 13 Sep 2009

@Kerry: the reasoning he gives against 'start <= i <= end' is at
the beginning of his paper. He says that 'A <= i < B' and 'A < i <= B' are better because they:

    ...have the advantage that the difference between the bounds as
    mentioned equals the length of the subsequence ... [also] two
    subsequences are adjacent [if] the upper bound of the one equals
    the lower bound of the other.

Shannon Love 8:13 AM on 20 Nov 2009

As long as we're dreaming, I would like division to be based on the number of "cuts" made to divide instead of the number of resulting pieces. Doing so would eliminate the divide by zero problem.

So, 9/2=3, 8/2=2, 6/1=3, 4/1=2, 1/1=0.5 and most importantly n/0=n.

Now, I just have to persuade everyone to overturn millennium of mathematical convention just so I don't have to watch for divide by zero exceptions.

Add a comment:

Ignore this:
not displayed and no spam.
Leave this empty:
not searched.
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.