RSS

Tools

Postmortem Debugging

Source Code Accompanies This Article. Download It Now.


Stefan is a software developer in Berlin specializing in cross-platform development, user interfaces, and reengineering. He can be contacted at StefanWoe@gmail.com.


Debugging applications locally is one of the most difficult tasks of software engineering. Debugging applications remotely is even more problematic. So what do you do when your application crashes at a remote client's site where you don't have a debugger? To address this problem, I developed a straightforward technique that I call "postmortem debugging" to debug program faults independently of where and when they occurred. Key to this technique is the "resymbolizing" of a stack unwind that has been stored to a log file. The technique consists of these steps:

  1. In the client application, install a handler that's called if the operating system determines a program fault or if an assert fails. This handler walks through the stack and writes the return addresses of all stack frames to a log file. This log file can then be sent via e-mail from the client to the developer.
  2. At the developer's site, invoke the MapAddr.exe utility program that reads the log file and a map file created when building the client's application. MapAddr.exe then writes a copy of the log file where it adds function names for all addresses by looking them up in the map file.

The result is a snapshot of the call stack (including function names) at the point where the program failed at the client's site; see Figure 1.

****************************************************
*** A Program Fault occurred:
*** Error code C0000005: ACCESS VIOLATION
****************************************************
***   Address: 004011B2
***     Flags: 00000000
****************************************************
*** CallStack:
****************************************************

  Fault Occurred At $ADDRESS:004011B2
                   ======== 004011B2 == belongs to ?FunctionC@@YAHHH@Z        
                          +       66 bytes
         with 64 00 00 00 10 00 00 00 2C FF 12 00 02 00 00 00 00 F0 FD 7F 

***  0 called from $ADDRESS:00401223
                   ======== 00401223 == belongs to ?FunctionB@@YAHH@Z         

                          +       67 bytes
         with 64 00 00 00 80 FF 12 00 02 00 00 00 00 F0 FD 7F CC CC CC CC 

***  1 called from $ADDRESS:0040128F
                   ======== 0040128F == belongs to ?FunctionA@@YAHXZ          
                          +       63 bytes
         with 00 00 00 00 02 00 00 00 00 F0 FD 7F CC CC CC CC CC CC CC CC 

***  2 called from $ADDRESS:00401302
                   ======== 00401302 == belongs to _main                      
                          +       66 bytes
         with 01 00 00 00 F0 2C 32 00 38 2D 32 00 00 00 00 00 02 00 00 00 

***  3 called from $ADDRESS:0040D04C
                   ======== 0040D04C == belongs to _mainCRTStartup            
                          +      252 bytes
         with 00 00 00 00 02 00 00 00 00 F0 FD 7F F0 0C 23 B2 C8 FF 12 00

Figure 1: Snapshot of a call stack.

To use this approach, all you do is:

  • Add a small source file to your application.
  • Call a function of this source file at program start that installs the handler using an appropriate OS-function.
  • Create and archive map files for all versions of your program. (Every linker can create a map file that contains the addresses of all functions and so on. Although usually disabled, it is an option you must activate. With Visual C++ 6.0, this is done in the "general" category from the Linker tab of the Projects settings.)

There is no need to add a library or install something additional at the client's machine. Everything you need is in the stack of the client machine and in the map file you archived.

Implementation and Sample Code

To illustrate, I present a small program (FaultApp.exe) with three functions calling each other; see Listing One. The last function, FunctionC, dereferences a bad pointer, thereby raising an access violation. (Also included are the sources for the fault handler and MapAddr.cpp.)

int FunctionC(int x, int y)
{
    char *p = (char *)0xBADC0DE; // Ummm!!, a dirty bad thing
    // Create a access violation
    char c = *p;
    return 0;
}
int FunctionB(int x)
{
    return FunctionC(x, 16);
}
int FunctionA()
{
    return FunctionB(100);
}
int main(int argc, char *argv[])
{
    InstallWin32FaultHandler();
    FunctionA();
    return 0;
}
Listing One: FaultApp, with three functions calling each other.

Although the technique is platform independent, the sample implementation is Windows specific, as it depends on the Win32 SetUnhandledExceptionFilter function. The sample implementation is also specific to Visual C++ 6.0 (although porting to other compilers only consists of the task of parsing its map files). Be aware that the installed handler will not be called if you launch FaultApp.exe using the debugger. In that case, Visual C++ halts debugging. You must launch FaultApp.exe without the debugger to get the installed handler called.

For x86-based UNIX/Posix systems, I provide an equivalent implementation using GCC 3.3 that installs a signal handler using the sigaction function. Interestingly, although the Visual C++ Runtime Library implements the Posix interface (including signal), it does so in a different way than UNIX systems—Windows calls the signal handler after the exit of main, where the stack and registers have already left the state of the fault. MapAddr.cpp also compiles under Linux using GCC 3.3 and can read GCC map files.

Installing the Handler

The Win32 API contains a function SetUnhandledExceptionFilter that makes it possible to install a handler that is called from Windows synchronously when a program fault occurs in your application. Here, "synchronously" means that the handler function is called as if it were a subroutine in your code at the point where the fault occurred. The call stack thereby contains the fault's address as a return address of the installed fault handler. The fault most typically is an access violation or a divide-by-zero problem. The handler is a function that you must provide. Its only parameter is a pointer to a structure holding information about the fault—the type, the code address, and so on. The return code of your handler determines if Windows should continue or abort the program. In my main application, I leave this choice to the client. But in most cases, the resumption of program execution results in more and more faults.

The file Win32FaultHandler.cpp contains the handler Win32FaultHandler (see Listing Two) and a function InstallFaultHandler() that installs this handler (Listing Three).

LONG Win32FaultHandler(struct _EXCEPTION_POINTERS *  ExInfo)

{   char  *FaultTx = "";
    switch(ExInfo->ExceptionRecord->ExceptionCode)
    {
      case EXCEPTION_ACCESS_VIOLATION      : 
                                      FaultTx = "ACCESS VIOLATION"         ; break;
      case EXCEPTION_DATATYPE_MISALIGNMENT : 
                                       FaultTx = "DATATYPE MISALIGNMENT"    ; break;
      case EXCEPTION_FLT_DIVIDE_BY_ZERO    : 
                                      FaultTx = "FLT DIVIDE BY ZERO"       ; break;
        ...
        default: FaultTx = "(unknown)";           break;
    }
    sgLogFile = fopen("Win32Fault.log", "w");
    int    wsFault    = ExInfo->ExceptionRecord->ExceptionCode;
    PVOID  CodeAdress = ExInfo->ExceptionRecord->ExceptionAddress;
    sgLogFile = fopen("Win32Fault.log", "w");
    if(sgLogFile != NULL)
    {
       fprintf(sgLogFile, "****************************************************\n");
       fprintf(sgLogFile, "*** A Program Fault occurred:\n");
       fprintf(sgLogFile, "*** Error code %08X: %s\n", wsFault, FaultTx);
       fprintf(sgLogFile, "****************************************************\n");
       fprintf(sgLogFile, "***   Address: %08X\n", (int)CodeAdress);
       fprintf(sgLogFile, "***     Flags: %08X\n", 
                                     ExInfo->ExceptionRecord->ExceptionFlags);
       LogStackFrames(CodeAddress, (char *)ExInfo->ContextRecord->Ebp);
       fclose(sgLogFile);
    }
    /*if(want to continue)
    {
       ExInfo->ContextRecord->Eip++;
       return EXCEPTION_CONTINUE_EXECUTION;
    }
    */ 
    return EXCEPTION_EXECUTE_HANDLER;
}
Listing Two: Win32FaultHandler.
void InstallFaultHandler()
{
    SetUnhandledExceptionFilter((LPTOP_LEVEL_EXCEPTION_FILTER) Win32FaultHandler);
}
Listing Three: InstallFaultHandler().

The function Win32FaultHandler() opens a log file, stores the type and code address of the fault, and then calls a function that writes the stack frames to this log file. It returns with the code EXCEPTION_EXECUTE_HANDLER, which handles control the standard handler of Windows (which terminates the application).

I've commented out a section that increments the instruction pointer (in the structure passed to the handler) and then returns with EXCEPTION_CONTINUE_EXECUTION. In the sample FaultApp program, this resumes normal program execution.

Saving the Stack to the Log File

The task of writing all return addresses from the stack to the log file is done by the function LogStackFrames (Listing Four). Each stack frame's return address is prefixed with the tag "$ADDRESS:". MapAddr.exe just looks for these entries and interprets them as program addresses.

void   LogStackFrames(PVOID FaultAddress, char *eNextBP)
{      char *p, *pBP;                                     
       unsigned i, x, BpPassed;
       static int  CurrentlyInTheStackDump = 0;
   ...
       BpPassed = (eNextBP != NULL);
       if(! eNextBP)
       {
          _asm mov     eNextBP, eBp   
       }
       else 
           fprintf(sgLogFile, "\n  Fault Occurred At $ADDRESS:%08LX\n", 
                                   (int)FaultAddress);
                             // prevent infinite loops
       for(i = 0; eNextBP && i < 100; i++)
       {      
           pBP = eNextBP;           // keep current BasePointer
           eNextBP = *(char **)pBP; // dereference next BP 
           p = pBP + 8; 
           // Write 20 Bytes of potential arguments
           fprintf(sgLogFile, "         with ");        
           for(x = 0; p < eNextBP && x < 20; p++, x++)
               fprintf(sgLogFile, "%02X ", *(unsigned char *)p);
           fprintf(sgLogFile, "\n\n");          
                  if(i == 1 && ! BpPassed) 
               fprintf(sgLogFile, "*************************************\n"
                                 "         Fault Occurred Here:\n");
        // Write the backjump address
        fprintf(sgLogFile, "*** %2d called from $ADDRESS:%08LX\n", 
                                  i, *(char **)(pBP + 4));
        if(*(char **)(pBP + 4) == NULL)
            break; 
       }
    ...

}
Listing Four: LogStackFrames.

LogStackFrames also prints the first 20 bytes of that area of the stack where arguments to the functions are stored. The determination of whether these are really arguments or local variables would need further inspection by the assembler code—or a look at the source code. According to the language specification, C and C++ functions put their arguments on the stack from right to left, whereby the address following the return address always holds the first argument of the function, if any. Interpreting these values requires a little bit of thinking, but usually this "manual" approach is sufficient.

Although Windows offers a function StackWalk(...) to walk the stack, for simplicity, this is done using simple pointer dereferencing. If LogStackFrames is called from an assert (where BP is unknown), one line of inline x86 assembler is used to get the value of the BP register.

MapAddr.exe

The program MapAddr.exe is called from the command line with two arguments—a log file path and a map file path. It first reads the map file and builds a sorted list of all addresses in it. It then creates a copy of the log file (with the suffix ".out") where, for each stack address found by the tag "$ADDRESS:", it inserts a line with the appropriate function name. It also prints the offset of the return address from the function start (extracting the actual line within the function from the address would be a more difficult task—although not impossible). The result is the log file including function names, as in Figure 1. The function names are mangled with their arguments, as all sources for this article are C++. Therefore, the function

int FunctionC(int x, int y)

becomes

?FunctionC@@YAHHH@Z

in the map file of Visual C++ 6.0. If you use the Visual C++ source browser, you can use the goto definition command to jump to the function body while the cursor is positioned on this mangled name.

Analyzing Client Program Faults

By adding one function call to an application, you can now analyze a problem postmortem that occurred on any machine anywhere else. You now know in which function the bug occurred and from where it was called. This certainly is an improvement, but it may only be one piece of an overall strategy for analyzing program faults on client machines. There are a number of considerations:

  • Asserts. To find hidden bugs early, use as many asserts as possible and leave them in the code in the shipping release versions. In the release version, an assert should enable postmortem debugging by calling the function LogStackFrames with both parameters NULL (which will determine BP by an assembler call). Clearly, an assert should always give a message as loudly as it can. "Loud" in this context may mean a modal dialog—it can also mean creating an e-mail or any other kind of signal in the case of a server application.
  • Sometimes asserts that fail may not be errors at all, leading to bug reports that are created solely by an assert. However, that problem is outweighed by the advantage of finding serious bugs quickly and early. Some people argue that there are performance reasons for leaving asserts out in release versions. In most cases, this argument is moot. Performance problems almost always have only two solutions—a profiler and/or better design.
  • Log files. Finding all the ways to the problem source often requires further information. Therefore, an application should always create a log file on startup and fill it with information about a program's basic conditions: the program and exact OS version, amount of memory, loaded documents, and so on. During execution, all commands should be logged together with their parameters. When a fault occurs, the current state (free memory, global application states, and the like) is of interest together with the stack frames. All this will lead to more success in locating the problem source.
  • Although writing log files is good in general, it can sometimes create problems. I saw a server application writing gigabytes of log files over the course of a few days until the disk was full. Log file writing requires control mechanisms (such as ring logging) that prevent such problems. Besides that, it's always helpful if the number of log statements written can be modified by users.
  • Automation. Using MapAddr.exe in a professional context, of course, implies that all map files of all versions must be archived and accessible for postmortem problem analysis. Also, in a professional context, you may write batch scripts (or extend MapAddr.exe), so that the program version is extracted from the log file and the appropriate map file is selected automatically.

Conclusion

No application is error free. The postmortem analysis approach I present here helps you identify and resolve hidden and unreproduceable errors with little impact on the program itself. The technique also reduces the problem of sometimes unclear and guess-based user reports. Using postmortem debugging together with logging and asserts, you can clearly see what happened.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video