Reap what you sow

Wednesday 30 November 2005This is 19 years old. Be careful.

A longish tale of debugging a failed server assertion, in which I suspect revelation of deep mysteries, but in the end am merely graphically reminded of what everyone knew all along.

The other day at work our server started raising failed assertions. The assertion was in the ATL SOAP code: our method handler had allowed an exception to go unhandled. Putting the IIS worker process w3wp.exe into the debugger, the exception was in our thread local abstraction:

1class CThreadLocal
2{
3public:
4    CThreadLocal()
5    {
6        m_tlsIndex = ::TlsAlloc();
7    }
8
9    ~CThreadLocal()
10    {
11        if (m_tlsIndex != TLS_OUT_OF_INDEXES) {
12            ::TlsFree(m_tlsIndex);
13        }
14    }
15
16    void* GetStorage()
17    {
18        if (m_tlsIndex == TLS_OUT_OF_INDEXES) {
19            THROW(CException, "Couldn't allocate TLS index");
20        }
21
22        void * p = ::TlsGetValue(m_tlsIndex);
23        if (p == 0) {
24            if (::GetLastError() != NO_ERROR) {
25                THROW(CWinException, "Couldn't get TLS value");
26            }
27        }
28        return p;
29    }
30
31
32    void SetStorage(void* p)
33    {
34        if (m_tlsIndex == TLS_OUT_OF_INDEXES) {
35            THROW(CException, "Couldn't allocate TLS index");
36        }
37        if (!::TlsSetValue(m_tlsIndex, p)) {
38            THROW(CWinException, "Couldn't set TLS value");
39        }
40    }
41
42private:
43    DWORD m_tlsIndex;
44};

If you don’t know about thread locals: go read about them, they’re damn handy in multi-threaded code. They let you store data that is available to all code in a single thread. Once you allocate an index with TlsAlloc (on Win32), you can store data in that slot. Each thread has its own instance of the slot, so each thread keeps its own state, and there is no need for synchronization. I think they’re easier to understand if you call them thread globals (because they are good replacements for global variables if you want to keep them scoped to a single thread), but I can see why they are called thread locals (because the data is only available to one thread).

Our CThreadLocal abstraction wraps the work of allocating and freeing the thread local index, which should be done during startup and shutdown. By creating a global CThreadLocal variable, the index is allocated once at the start of the process, and then deallocated at the end.

The exception was at line 19. Apparently the constructor hadn’t been able to allocate a thread local index. There are a limited (but large) number of thread local indexes, so you have to guard against running out, but how can you really run out? This code had been untouched for over a year, with no problems, it worked great. The new twist was that our DLL was now being dynamically loaded into the IIS worker process. It seemed like this would be one of those bugs with an interesting learning moment at the end. Were the C++ globals not being constructed properly? Not being destroyed properly? If they weren’t, what could we do about it? Was this a bug in the loader? Was there some magic DLL wizardry we weren’t intoning just right?

I had reached this point in the process just before leaving for the day (I hate that!), so I was mulling over esoteric possibilities for a while.

The next day, I was able to do some hands-on debugging. For whatever reason (maybe because of the ATL SOAP code), the worker process loads and unloads our DLL quite frequently. I added some tracing to the CThreadLocal constructor and destructor, and saw traces like this:

Loading ksCore.dll...
Allocating tls index 20
Allocating tls index 21
Allocating tls index 22
Allocating tls index 24
Unloading ksCore.dll...
Freeing tls index 24
Freeing tls index 22
Freeing tls index 21
Freeing tls index 20
Loading ksCore.dll...
Allocating tls index 20
Allocating tls index 21
Allocating tls index 22
Allocating tls index 25
Unloading ksCore.dll...
Freeing tls index 25
Freeing tls index 22
Freeing tls index 21
Freeing tls index 20
Loading ksCore.dll...
Allocating tls index 20
Allocating tls index 21
Allocating tls index 22
Allocating tls index 26
Unloading ksCore.dll...
Freeing tls index 26
Freeing tls index 22
Freeing tls index 21
Freeing tls index 20

Hmmm, it looks like the constructor and destructor are working properly, but why is that last allocated index incrementing each time? And why did the first trace skip index 23? Setting a breakpoint on TlsAlloc showed that the CThreadLocal abstraction was working great, but that we had a rogue TlsAlloc call that didn’t use the class at all. It was in its own class that allocated an index in the constructor, but had no destructor:

class CMyPrivateThreadLocal:
{
public:
    CMyPrivateThreadLocal()
    {
        m_index = ::TlsAlloc();
    }

    // Blah blah...

private:
    DWORD m_index;
};

CMyPrivateThreadLocal mythreadlocal;

So this one thread local index was leaking: each time the DLL was loaded, this global was constructed, allocating an index which was never freed. All the other thread locals were working properly because of the CThreadLocal code, but we were losing one index per load/unload cycle. Win32 says we get 1088 (wtf?) thread local indexes per process, so after 1050 or so loads, the indexes would be used up, the allocation would fail, an exception would be thrown, and the ATL code raised a failed assertion.

The code had worked fine in a standalone process, because the incorrect thread local was only being allocated once at startup. If the resource isn’t freed at process shutdown, who cares? The whole process is shutting down anyway. But in a dynamically loaded environment, a single process is starting and stopping the DLL over and over again, requiring stricter attention to freeing all resources.

Getting rid of CMyPrivateThreadLocal and replacing it with a CThreadLocal fixed the problem nicely. There was no deep mystery of the C++ world revealed after all. Just a reminder to code meticulously, and free all your resources, even if you don’t think you have to. As with all modular code, you can’t make assumptions about the environment your code will run in. The author of CMyPrivateThreadLocal assumed that his global variable would be created and destroyed only once per process. He was wrong. Especially when writing server code, all resources allocated have to be freed.

Comments

[gravatar]
Sounds like Creepy Threads.
[gravatar]
1088 = 1024 + 64; my guess would be there's an initial small (64-element) pool allocated, with a one-time overflow (1024-element) pool available.

But that's just a guess.
[gravatar]
Personally I'd throw in the constructor if you couldn't allocate an index and then never have to check that the index was valid after that. Doing so would remove 8 lines of code and add 3 and would (IMHO) make the class easier to understand and use... It probably doesn't make that much difference in such a small class but preventing 'zombie' objects is a good habit to get into; if the constructor worked then the object is valid, if not there is no object...
[gravatar]
Len, I thought someone would pick up on that. I believe it is coded that way because the constructor is used for static globals, and so an exception thrown from the constructor would not throw to a place that could handle it.

By deferring the throw to the use of the method, the exception is more visible.
[gravatar]
Fair enough.

I expect I'd probably design the user of this class, the static global, differently so as to allow this general purpose class to be designed 'better'. But, as with all these things, it's easy for someone not involved in the actual problem to nit-pick on particular details without having to deal with all of the details...

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.