For ages, we've had stack trace code in our product at work. I picked it up from a sample someplace, and it gave me a good feeling: I had a powerful diagnostic tool built into the product, and we could use it to pinpoint fleeting problems. It had only one flaw: it didn't work.

At least, it didn't work reliably. Sometimes, it would give a beautiful deep stack trace, complete with symbols and line numbers. Sometimes it would list only two functions, KiFastCallSomethingOrOther and DumpUserModeThingaMaJig.

My code used GetThreadContext to load up the program counter and frame pointer for the current thread. It looked something like this:

CONTEXT c;

memset(&c, 0, sizeof(c));
c.ContextFlags = CONTEXT_FULL;

if (!::GetThreadContext(::GetCurrentThread(), &c)) {
    return;
}

STACKFRAME s; // in/out stackframe
memset(&s, 0, sizeof(s));

// Init STACKFRAME for first call
s.AddrPC.Offset = c.Eip;
s.AddrPC.Mode = AddrModeFlat;
s.AddrFrame.Offset = c.Ebp;
s.AddrFrame.Mode = AddrModeFlat;
s.AddrStack.Offset = c.Esp;
s.AddrStack.Mode = AddrModeFlat;

// .. now use StackWalk to walk the stack ..

There seem to be lots of people out there advocating this method. But the docs for GetThreadContext say,

You cannot get a valid context for a running thread. Use the SuspendThread function to suspend the thread before calling GetThreadContext.

But that's just what I wanted: a stack trace for the current thread. It seemed like a lot of bother to spawn another thread just to suspend the current one, get a context, and restart it. And judging from my empirical data, it seemed like the docs were right: getting a context on the current thread didn't work too well.

Yesterday I dug around some more and found Visual Leak Detector at The Code Project. It included stack tracing code that doesn't use GetThreadContext. Instead it does this:

#pragma auto_inline(off)
DWORD_PTR VisualLeakDetector::getprogramcounterx86x64()
{
    DWORD_PTR programcounter;

    // Get the return address out of the current stack frame
    __asm mov eax, [ebp + 4]
    // Put the return address into the variable we'll return
    __asm mov [programcounter], eax

    return programcounter;
}
#pragma auto_inline(on)

void VisualLeakDetector::getstacktrace (CallStack *callstack)
{
    CONTEXT      context;
    STACKFRAME64 frame;
    DWORD_PTR    framepointer;
    DWORD_PTR    programcounter;

    // Get the required values for initialization of the STACKFRAME64
    // structure to be passed to StackWalk64(). Required fields are
    // AddrPC and AddrFrame.
    programcounter = getprogramcounterx86x64();
    // Get the frame pointer (aka base pointer)
    __asm mov [framepointer], BPREG

    // Initialize the STACKFRAME64 structure.
    memset(&frame, 0, sizeof(frame));
    frame.AddrPC.Offset    = programcounter;
    frame.AddrPC.Mode      = AddrModeFlat;
    frame.AddrFrame.Offset = framepointer;
    frame.AddrFrame.Mode   = AddrModeFlat;

    // .. use StackWalk to walk the stack ..

Holy moly. In the words of a colleague, "If it uses inline assembly code, it's got to be good!". I tried out the code, and it worked really well, until I built a Release version, when it seemed to be worse than the old GetThreadContext code. I stepped through it, and read about stack frames, and discovered that the "ebp + 4" line should really be "esp + 4". After that change, the code worked perfectly.

But while I was researching the __asm keyword, I discovered a Microsoft built-in function: _ReturnAddress. Using this, I could get rid of some of the inline assembly language, including the bit that I had to fix:

// _ReturnAddress should be prototyped before use
extern "C" void * _ReturnAddress(void);

#pragma intrinsic(_ReturnAddress)

#pragma auto_inline(off)
DWORD_PTR
GetProgramCounter()
{
    return (DWORD_PTR)_ReturnAddress();
}
#pragma auto_inline(on)

Funny thing about _ReturnAddress: everyone seems to agree that it's designed for figuring out who's calling you so you can decide whether to trust them, and everyone also agress that's a really bad thing to try to do.

tagged: » 7 reactions

Comments

[gravatar]
andrew 11:18 AM on 6 Oct 2005

Not just inline assembly, but assembly that purports to be x86 *and* x64! What are the odds?!

You should get Bob to weigh in here...he is deep in .NET hell and would probably welcome something as mundane as reading and deciphering __asm.

[gravatar]
Platypus 11:55 AM on 6 Oct 2005

Esp+4 probably isn't right either. Ebp+4 is what you'd want as long as you're using frame pointers, which might be omitted as an optimization. That same optimization might well rely on the stack being unwound properly so that esp points to the right place by the time you return, but that doesn't mean it will be in the right place in the middle of the function. In general, if you want to debug you should leave frame pointers enabled.

[gravatar]
Len Holgate 12:14 PM on 6 Oct 2005

If you're on XP or later then you can use RtlCaptureContext() to give you a context structure that you can use to build your stack frame.

I'm currently using this:

#define GET_CURRENT_CONTEXT(c, contextFlags) \
do { \
memset(&c, 0, sizeof(CONTEXT)); \
c.ContextFlags = contextFlags; \
__asm call x \
__asm x: pop eax \
__asm mov c.Eip, eax \
__asm mov c.Ebp, ebp \
__asm mov c.Esp, esp \
} while(0);

on x86 if RtlCaptureContext() isn't available.

I think this snippet originated from Jochen Kalmbach's StackWalker code on CodeProject and at http://blog.kalmbachnet.de/ but I can't be sure as I was looking at a lot of sample code when I wrote my stack walking library.

Oh, and your links to GetThreadContext and SuspendThread are broken as they're currently relative to your site...

[gravatar]
Ned Batchelder 12:30 PM on 6 Oct 2005

That's what I love about my readers! They write better stuff than I do! (And I fixed the URLs, thanks!)

[gravatar]
Bob 4:35 PM on 8 Oct 2005

Andrew: Okay, I'll bite. The asm code that others have brought up should work fine. The important thing to fill out the STACKFRAME arg to StackWalk64 and there are several ways to do that. One that I don't think anyone else has mentioned is to build a SEH __try / __except block that intentionally causes an exception and then retrieves the CONTEXT from the exception info. No asm code required. We used to do stuff like that in Java code when some condition occurred and we wanted to log the stack to figure out who caused it -- but didn't want the operation to fail.

[gravatar]
Dan Moulding 9:36 AM on 18 Jan 2006

Howdy folks. I know this is an old topic, but what the hey -- I wrote the inline assembly at issue so I can't resist commenting. The original code was actually like this:

#if defined(_M_IX86) || defined(_M_X64)
#pragma auto_inline(off)
DWORD_PTR getprogramcounterx86x64 ()
{
DWORD_PTR programcounter;

__asm mov AXREG, [BPREG + SIZEOFPTR] // Get the return address out of the current stack frame
__asm mov [programcounter], AXREG // Put the return address into the variable we'll return

return programcounter;
}
#pragma auto_inline(on)
#endif // defined(_M_IX86) || defined(_M_X64)

Not only does it purport to work for both x86 and x64, but it actually does. When _M_IX86 is defined, AXREG and BPREG expand to "eax" and "ebp", respectively. When _M_IX64 is defined, they expand to "rax" and "rbp". SIZEOFPTR is set to 32-bits or 64-bits, as needed, as well. But I digress.

Platypus is correct about the frame pointer omission optimization. If ebp is used as the frame pointer, then [ebp+4] is the correct way to obtain the offset for where the return address has been pushed onto the stack. However, if frame pointer omission optimization is used (it is by default in Visual C++ in release builds) then ebp isn't used as the frame pointer and thus [ebp+4] is no longer a valid way to get the return address. I think the reason using [esp+4] ends up working for this function is because there is always only one 32-bit variable pushed onto the stack, so [esp+4] will always yield the same result that [ebp+4] would if ebp contains a valid frame pointer.

Thank you Ned for pointing out the _ReturnAddress intrinsic. If I had known about it before, I would have just used it instead of spending a lot of time learning how x86 stack frames are laid out. I just might use that instead next time around. Ya learn somethin' new every day.

Ciao,

-- Dan

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.