The pitfalls of fixing error handling first

Wednesday 15 December 2004This is 20 years old. Be careful.

One of my favorite essays is Fix error handling first, which argues that if something goes wrong in your code, and then something goes wrong while handling the error, you have to fix the error handling before you fix the problem (because fixing the problem will eliminate the possibility of reproducing the error handling bug). An example of this is that if an error message is logged which doesn’t tell you all you need to know, you should beef up the error message to add more information before you do the work to avoid the error condition.

But suppose it’s impossible to tell if you’ve done everything you can do? When do you stop?

I just spent a full working day beating my head against one of OLE DB’s many brick walls trying to get an error message that I thought for sure must be in there somewhere. I added a call to a SQL Server system stored procedure (sp_rename, if you must know), and the execution of the statement produced this error message:

Multiple-step OLE DB operation generated errors.
Check each OLE DB status value, if available. No work was done.

Like a good boy, I went in search of the OLE DB status values. The difficulty was, OLE DB is very complex, and I’m using the consumer templates, which add another layer of complexity. I already had a complicated routine for digging out error information using CDBErrorInfo, GetErrorRecords, GetBasicErrorInfo, GetAllErrorInfo, GetErrorParameters, GetCustomErrorObject(IID_ISQLErrorInfo) and GetCustomErrorObject(IID_ISQLServerErrorInfo). None of them produced more information.

I tried checking the statuses of my command parameters after the fact with GetParamStatus. Nothing came out. I tried pasing an Unknown* into GetErrorRecords, but it didn’t change the behavior at all.

A co-worker who knows a hundred times more about any given Microsoft technology than I do helped me find places to look, helped my pore over the OLE DB tomes, helped write scripts to try to reproduce the scenario to see what errors ADO produced.

This last was the most tantalizing of all, because executing the same statement in ADO from VB produced different messages than using OLE DB in C++. Obviously, they were doing something different than we were. We put intentionally erroneous stored procedure calls into both programs, and the VB script would return nice messages, while the C++ code would say, “Errors occurred”. Arggh!

I wanted to give up a number of times, but the mantra of “fixing error handling first” was strong with both of us, and we pushed on. It was more than just dogma that drove us, though. We really didn’t know why the stored procedure wasn’t working. We thought those missing error messages would tell us. And we didn’t want to end up with a problem at a customer site where the only information was “errors occurred”. It might as well have said, “Shit happens”. So we really wanted to get all the information we could, and get it automatically, and have it well presented. But that “if available” in the error message was taunting me with the possibility that there was nothing more to be found.

We continued to experiment. We pursued a strenous course of talmudic reading and interpretation of the occasionally inscrutable documentation. We tried to reason out how the system must behave.

Finally, we found this Microsoft support article: “Errors Occurred” Error When Calling Stored Procedure with More than a SELECT Statement, which says (in part):

SYMPTOMS

When calling ICommand::Execute to execute a Microsoft SQL Server stored procedure using the Microsoft SQL Server OLE DB Provider, DB_E_ERRORSOCCURRED is returned and the IErrorInfo description information states “Errors Occurred.”

CAUSE

The stored procedure contains more than a simple SELECT statement. A client side cursor is required.

RESOLUTION

In order to resolve this, you will have to use client side cursors.

STATUS

This behavior is by design (although Microsoft is investigating whether a more informative error message can be returned).

Sure enough, we were using server-side cursors. Switching to client-side cursors caused our stored procedure to work, and made our intentionally-wrong statements produce the same error messages as ADO’s. That parenthetical admission that the error message was not informative finally told us for sure that there was no more information to be found. Our error handling was not at fault.

If we hadn’t found that admission, how much longer would we have tried? I don’t know. I still believe firmly in fixing error handling first. But often it comes down to trying to get information from other components, and it can be impossible to tell when you’ve bullied them into telling all they know.

The good thing that came out of the whole experience is that we gained confidence that the error handling for these messages is as good as it can get. The stored procedure is working now, and we had a few laughs along the way, so, all in all, a good day’s work.

Comments

[gravatar]
The support article was published in May 2003. Think someone at Microsoft is still "investigating whether a more informative error message can be returned"? To be fair, someone probably did look into it and decided that the code change would be too disruptive to return a more descriptive error.
[gravatar]
Yes, "Fix error handling first" is one of my guiding principles. And it's one of my favorite articles of yours. The only constant in programming is that I will continue to make mistakes. The first and most important job I have is to make sure I can recover from all the mistakes I'm about to make. Eg, the exceptions have to make their way back to the developers automatically, and they must be as informative and friendly (to developers) as possible.

Incidentally, since it's a hot topic now, that's one of the biggest problems with heavy client-side web apps (javascript, DHTML, Google Suggest, etc). There is absolutely *no* way to provide coherent global error handling using client browser code. Your users may be getting errors all over the place, but you'd never know about it. And that is, in my experience, a recipe for failure.
[gravatar]
In a tough enviroment, make it up running is the first/only thing we can do, sometimes even work around is good enough. Make it work, there is more than one way to do it.
[gravatar]
Someone just now (in 2009) forwarded me your original "Fix error handling first" essay and it will become one of my favorites too.

On this particular issue, if you were looking for how to reproduce the problem, I wonder what Google's search would have returned in 2004? Today, a search for

OLE "errors occurred"

returns

"FIX: Uninitialized String Variables, Empty String Values, Jet Provider, and Errors Occurred" (http://support.microsoft.com/kb/228935) dated 2003, complete with "Steps to Reproduce Behavior".

On the general method, you might be interested in "Read Before Running" (http://talkaboutquality.wordpress.com/2008/02/27/read-before-running), which emphasizes planning debugging before trying anything.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.