Exceptions in the rainforest

Created 16 October 2003

As part of a debate about exceptions and status returns, Joel asked for an example of exception handling using a particular chunk of code. Before jumping to the code, I want to talk about rainforests for a little bit. If you haven’t read my previous article about exceptions and status returns, you might want to start there.

Rainforests

If you’ve ever studied the rainforest, you know that it is not a simple place. A simplistic model of it would be that there are lots of trees, and lots of animals, and they all live together. It’s more interesting than that: The forest is divided horizontally into layers, and each layer has its own ecosystem, with different inhabitants. To understand how the rainforest works, you have to consider the layers separately, and see how they differ from each other.

Complex software is the same way: there are different layers, and the error handling they perform is different. If we want to discuss what exception handling looks like in real code, we have to talk about the layers.

Three layers of code

In my experience, there are three layers to real code (from bottom to top, so this list might look upside-down):

Adapting the software beneath you.
Building pieces of your system.
Combining it all together.

Keep in mind, this is a simple model, and real software is fractal in most of its aspects. A 100,000-line system will have layers within layers within layers. But this three-layer model closely matches the way I’ve seen a number of real systems evolve. Let’s look at each of these layers in detail.

Adapting the software beneath you

Beneath every piece of software is more software. Your Windows application sits on top of the Win32 API, or ATL. Your PHP web site sits on top of MySQL calls, and PHP primitives. Your Java system sits on top of the JDK, the J2EE facilities. Even if you are writing a device driver, your code is sitting on top of the actual I/O operations that write bits to the disk, or whatever it is your driver does.

At the lowest layer of your system, your code deals with your particular underlying software. It makes its calls, and interprets the results. This layer is where you convert cultures, making the underlying software more the way you’d like it to be: operations become more convenient, concepts are presented more palatably to the rest of the system, ugly workarounds are hidden.

Building pieces of your system

The middle layer of your code is where you construct the pieces of your world. Are you writing a spreadsheet? You’ll need a cell engine, and some way to read and write data files, and connectors to databases, and charting modules. In some worlds this is called business logic.

This is where the bulk of the code will be, and where you are likely to be adding value. Few applications compete on how well they read and write the registry. The interesting technology is in the cell engines, or drawing paradigms, or database intelligence, or logical inference algorithms. This is the interesting part. The more time you can spend here productively, the better off you will be.

Combining it all together

At the top of your system is the big picture. For example: when the application starts, we need to create an empty document, initialize the database layer, and show the GUI. This is where you can see the main flow of the application. If you had to explain what your system did in detail to a knowledgeable user, this layer is the one you’d be talking about. This is the stage manager layer, coordinating pieces, making the whole thing hang together into a cohesive whole.

How exceptions are used in the layers

At the bottom layer (Adapting), there’s a lot of throwing exceptions. Unless you are coding in Java or C#, where the system toolkits throw exceptions (in which case, I’m preaching to the choir), the layer beneath you more than likely is returning statuses to you. Each call will have its return value checked, and converted into an appropriate exception, and thrown. Sometimes, error values will be dealt with immediately. For example, this layer may implement some simple retrying mechanism for some operations, or it may decide that some error returns are really not errors.

At the middle layer (Building), things are flowing pretty smoothly. Typically, there’s not a lot of exceptions being thrown, and not a lot being caught either. This is where you often get to just think about the ideal case, and focus on the algorithms and data structures at hand. Of course, exceptions can happen, especially in the A-layer calls you make. But for the most part, you can let those exceptions fly. An upper layer will deal with them.

At the top layer (Combining), there’s a lot of catching exceptions happening. Couldn’t open a file? Now you have to decide what to do about it. You can alert the user, try a different file name, exit the application, whatever you as the system designer decide is the best approach.

This C-layer code can actually be quite pre-occupied with dealing with exceptions. This makes sense: this is the layer where the code really knows what’s going on. If you have an A-layer function to open a file, what should it do when the file can’t be opened? How can you possibly say? This function will be used to open all sorts of files for all sorts of reasons. Maybe the C-layer caller knows that the file could be missing, and has a plan for what to do in that case, so alerting the user would be wrong. It’s the C-layer that understands the big picture, so it’s the C-layer that should be dealing with the exceptions.

Exceptions vs. status returns again

Now for Joel’s example. He asked that we discuss this code:

void InstallSoftware()
{
    CopyFiles();
    MakeRegistryEntries();
}

Using the three-layer model above, this is clearly C-layer code. I know Joel asked for this example because he knew that even with exceptions the code would be cluttered with error handling, just as it would be with status returns. He’s right. It’s C-layer code, so it will have to deal with unusual cases. There’s no way around that.

Others have taken up this challenge, and come up with some nice ways to deal with it cleanly, using C++ destructor semantics to ensure that operations are rolled back. To be perfectly honest, I don’t know that I would have been as clever as these writers, though they have given me some good ideas. I might have done it like this:

void InstallSoftware()
{
    try {
        CopyFiles();
        MakeRegistryEntries();
    }
    catch (CException & ex) {
        RemoveFiles();
        DeleteRegistryEntries();
        throw ex;
    }
}

This function either succeeds, in which case the files are copied and the registry entries are written, or it throws an exception, and the files and registry entries are cleaned up. Is this sufficient? I don’t really know, and in a real implementation I can imagine it getting much hairier than this.

The status return folks may well be crowing about this code, that it is either not handling the problems completely, or that it is just as ugly as status return code. They’re missing the point. I’m not claiming that exceptions make all code prettier, or that they somehow remove the burden of thinking through what should happen when something goes wrong.

The debate over exceptions and status returns is not about whether error handling is hard to do well. We all agree on that. It’s not about whether exceptions make it magically better. They don’t, and if someone says they do, they haven’t written large systems in the real world.

The debate is about how errors should be communicated through the code. The C-layer code we’re talking about is going to be complicated no matter which technique you use to communicate errors around.

But what does the B-layer code look like?

void MakeRegistryEntries()
{
    CRegistry reg;

    reg.WriteString("ProductName", "Ned's FooBar");
    reg.WriteString("Version", "1.2b");
    reg.WriteDword("WebUpdateInterval", 7*24*60*60);
}

Here at the B-layer, we can get into the zone and just write registry entries. How would this look with status returns? Either cluttered with if statements, or hidden behind macros that simply pull your code into the “hidden function return” camp that are supposed to make exceptions evil.

The A-layer code looks like this:

void CRegistry::WriteString(
    const char * pszValueName,
    const char * pszValue
    )
{
    ASSERT(m_hKey != NULL);

    DWORD cbData = (DWORD)(strlen(pszValue)+1);

    LONG lRet = ::RegSetValueEx(m_hKey, pszValueName, NULL, REG_SZ, (BYTE*)pszValue, cbData);
    if (lRet != ERROR_SUCCESS) {
        CWin32Exception ex(lRet);
        throw ex;
    }
}

Here we’re adapting to the Win32 registry functions, converting their status returns into exceptions (which carries the actual status return as data so that it can be used for error messages, or analysis).

These example are all too brief to be real code, but demonstrate the concepts. Broadly speaking:

A-layer generates exceptions,
B-layer can often ignore the whole issue, and
C-layer decides what to do

Exceptions are better at communicating errors

The challenge in building a large system is making sure errors get communicated around. Exceptions are a better way to do that than status returns:

Exceptions can carry richer information. If error handling is so important, why try to cram it all into a DWORD?
Exceptions let the B-layer get on with its work without being a mindless bucket brigade for status returns.
Exceptions make human error (failure to catch) visible, while error returns make human error (failure to check) invisible.
Exceptions leave the primary channel (function returns) available for the primary work.