Thursday 6 October 2005 — This is exactly 19 years old. Be careful.
For ages, we’ve had stack trace code in our product at work. I picked it up from a sample someplace, and it gave me a good feeling: I had a powerful diagnostic tool built into the product, and we could use it to pinpoint fleeting problems. It had only one flaw: it didn’t work.
At least, it didn’t work reliably. Sometimes, it would give a beautiful deep stack trace, complete with symbols and line numbers. Sometimes it would list only two functions, KiFastCallSomethingOrOther and DumpUserModeThingaMaJig.
My code used GetThreadContext to load up the program counter and frame pointer for the current thread. It looked something like this:
CONTEXT c;
memset(&c, 0, sizeof(c));
c.ContextFlags = CONTEXT_FULL;
if (!::GetThreadContext(::GetCurrentThread(), &c)) {
return;
}
STACKFRAME s; // in/out stackframe
memset(&s, 0, sizeof(s));
// Init STACKFRAME for first call
s.AddrPC.Offset = c.Eip;
s.AddrPC.Mode = AddrModeFlat;
s.AddrFrame.Offset = c.Ebp;
s.AddrFrame.Mode = AddrModeFlat;
s.AddrStack.Offset = c.Esp;
s.AddrStack.Mode = AddrModeFlat;
// .. now use StackWalk to walk the stack ..
There seem to be lots of people out there advocating this method. But the docs for GetThreadContext say,
You cannot get a valid context for a running thread. Use the SuspendThread function to suspend the thread before calling GetThreadContext.
But that’s just what I wanted: a stack trace for the current thread. It seemed like a lot of bother to spawn another thread just to suspend the current one, get a context, and restart it. And judging from my empirical data, it seemed like the docs were right: getting a context on the current thread didn’t work too well.
Yesterday I dug around some more and found Visual Leak Detector at The Code Project. It included stack tracing code that doesn’t use GetThreadContext. Instead it does this:
#pragma auto_inline(off)
DWORD_PTR VisualLeakDetector::getprogramcounterx86x64()
{
DWORD_PTR programcounter;
// Get the return address out of the current stack frame
__asm mov eax, [ebp + 4]
// Put the return address into the variable we'll return
__asm mov [programcounter], eax
return programcounter;
}
#pragma auto_inline(on)
void VisualLeakDetector::getstacktrace (CallStack *callstack)
{
CONTEXT context;
STACKFRAME64 frame;
DWORD_PTR framepointer;
DWORD_PTR programcounter;
// Get the required values for initialization of the STACKFRAME64
// structure to be passed to StackWalk64(). Required fields are
// AddrPC and AddrFrame.
programcounter = getprogramcounterx86x64();
// Get the frame pointer (aka base pointer)
__asm mov [framepointer], BPREG
// Initialize the STACKFRAME64 structure.
memset(&frame, 0, sizeof(frame));
frame.AddrPC.Offset = programcounter;
frame.AddrPC.Mode = AddrModeFlat;
frame.AddrFrame.Offset = framepointer;
frame.AddrFrame.Mode = AddrModeFlat;
// .. use StackWalk to walk the stack ..
Holy moly. In the words of a colleague, “If it uses inline assembly code, it’s got to be good!”. I tried out the code, and it worked really well, until I built a Release version, when it seemed to be worse than the old GetThreadContext code. I stepped through it, and read about stack frames, and discovered that the “ebp + 4” line should really be “esp + 4”. After that change, the code worked perfectly.
But while I was researching the __asm keyword, I discovered a Microsoft built-in function: _ReturnAddress. Using this, I could get rid of some of the inline assembly language, including the bit that I had to fix:
// _ReturnAddress should be prototyped before use
extern "C" void * _ReturnAddress(void);
#pragma intrinsic(_ReturnAddress)
#pragma auto_inline(off)
DWORD_PTR
GetProgramCounter()
{
return (DWORD_PTR)_ReturnAddress();
}
#pragma auto_inline(on)
Funny thing about _ReturnAddress: everyone seems to agree that it’s designed for figuring out who’s calling you so you can decide whether to trust them, and everyone also agress that’s a really bad thing to try to do.
Comments
You should get Bob to weigh in here...he is deep in .NET hell and would probably welcome something as mundane as reading and deciphering __asm.
I'm currently using this:
#define GET_CURRENT_CONTEXT(c, contextFlags) \
do { \
memset(&c, 0, sizeof(CONTEXT)); \
c.ContextFlags = contextFlags; \
__asm call x \
__asm x: pop eax \
__asm mov c.Eip, eax \
__asm mov c.Ebp, ebp \
__asm mov c.Esp, esp \
} while(0);
on x86 if RtlCaptureContext() isn't available.
I think this snippet originated from Jochen Kalmbach's StackWalker code on CodeProject and at http://blog.kalmbachnet.de/ but I can't be sure as I was looking at a lot of sample code when I wrote my stack walking library.
Oh, and your links to GetThreadContext and SuspendThread are broken as they're currently relative to your site...
http://www.codeguru.com/Cpp/W-P/system/threading/article.php/c10317
#if defined(_M_IX86) || defined(_M_X64)
#pragma auto_inline(off)
DWORD_PTR getprogramcounterx86x64 ()
{
DWORD_PTR programcounter;
__asm mov AXREG, [BPREG + SIZEOFPTR] // Get the return address out of the current stack frame
__asm mov [programcounter], AXREG // Put the return address into the variable we'll return
return programcounter;
}
#pragma auto_inline(on)
#endif // defined(_M_IX86) || defined(_M_X64)
Not only does it purport to work for both x86 and x64, but it actually does. When _M_IX86 is defined, AXREG and BPREG expand to "eax" and "ebp", respectively. When _M_IX64 is defined, they expand to "rax" and "rbp". SIZEOFPTR is set to 32-bits or 64-bits, as needed, as well. But I digress.
Platypus is correct about the frame pointer omission optimization. If ebp is used as the frame pointer, then [ebp+4] is the correct way to obtain the offset for where the return address has been pushed onto the stack. However, if frame pointer omission optimization is used (it is by default in Visual C++ in release builds) then ebp isn't used as the frame pointer and thus [ebp+4] is no longer a valid way to get the return address. I think the reason using [esp+4] ends up working for this function is because there is always only one 32-bit variable pushed onto the stack, so [esp+4] will always yield the same result that [ebp+4] would if ebp contains a valid frame pointer.
Thank you Ned for pointing out the _ReturnAddress intrinsic. If I had known about it before, I would have just used it instead of spending a lot of time learning how x86 stack frames are laid out. I just might use that instead next time around. Ya learn somethin' new every day.
Ciao,
-- Dan
Add a comment: