Dr. Dobb's | Debugging with NTSD and Application Verifier

Did you know that Windows ships with built-in debugging tools--the Microsoft NT Symbolic Debugger and the Application Verifier?

Many Windows developers are unaware that Windows ships with its own built-in debugger—the Microsoft NT Symbolic Debugger (ntsd). In this article, I describe how to use ntsd to debug a few straightforward problems. I also describe the Microsoft Application Verifier (AppVerif) tool and present some examples that illustrate both a strength and limitation of AppVerif when finding buffer overruns on the heap.

The ntsd command-line debugger is not as pretty as Visual Studio's integrated debugger. Despite this (or perhaps because of this), ntsd.exe and its cousins are arguably the debuggers of choice for developers at Microsoft who build the core of the Windows operating system

Although ntsd has historically shipped in-box with Windows NT right up through Windows XP, Microsoft is continually improving it. Consequently, I recommend downloading the most recent version of the Microsoft Debugging Tools for Windows package (www.microsoft.com/whdc/devtools/debugging), which includes ntsd, the Windows Debugger WinDbg, the Kernel Debugger KD, an SDK for writing debugger extensions, and the debugger.chm help file.

Example

Here's an example that illustrates how to use ntsd to debug a typical application crash. I use the word "crash" to refer to any situation where the application was terminated abnormally by the operating system. The most common cause of application crashes is where the application attempts to read from or write to an invalid memory location. This is called an "access violation" (AV). For example, the application may attempt to dereference a NULL pointer. Example 1 is designed to do just this.

void main(void)
{
    char *p = 0;
    *p = 123;
}

Example 1: C program designed to dereference a NULL pointer.

When I run this program (t1.exe), Windows terminates it at the point of the access violation and issues the expected message that there was a problem with my program. To debug this, I run ntsd.exe from the command line, passing the name of my application as an argument; for example, ntsd.exe -g t1.exe. With the -g option, ntsd.exe loads and immediately runs the application. Without the -g option, ntsd.exe loads the application, then immediately breaks before the application runs, requiring the g command to let the application continue. When the AV occurs, ntsd.exe breaks in and presents me with a debugger command window like Example 2.

Microsoft (R) Windows Debugger  Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: t1.exe
Symbol search path is: SRV*c:\Files\websymbols*http://msdl.microsoft.com/downloa
d/symbols
Executable search path is:
ModLoad: 00400000 0040f000   t1.exe
ModLoad: 7c900000 7c9b0000   ntdll.dll
ModLoad: 7c800000 7c8f4000   C:\WINDOWS\system32\kernel32.dll
(ce4.ddc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=7ffdb000 ecx=00320758 edx=00320000 esi=7c9118f1  	edi=00011970
eip=0040101e esp=0012ff7c ebp=0012ff80 iopl=0     nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000        efl=00010206
*** WARNING: Unable to verify checksum for t1.exe
t1!main+0xe:
0040101e c6007b    mov     byte ptr [eax],7Bh     ds:0023:00000000=??
0:000>

Example 2: Contents of an ntsd command window at the point where the application t1.exe has caused an AV.

The initial debugger output reveals useful information. After displaying the command line used to start the application, it shows the search path for finding symbol files (PDBs). The symbol search path is important and, in this case, I had specified it beforehand by setting the _NT_SYMBOL_PATH system variable before running ntsd.exe. The symbol path shown here is SRV*c:\Files\websymbols*http://msdl.microsoft.com/download/symbols, which identifies a symbol server URL and the location where downloaded symbol files may be cached. That is, when ntsd needs to load symbols for a binary, it connects to the symbol server at the specified URL, and requests PDBs based on unique characteristics of the binary such as the name of the binary and its timestamp. The URL here is for the Microsoft public symbol server, which provides public Windows symbols for many versions of Windows. Of course, I also need to make sure that ntsd can find the symbols for my application. If I wish to add, say, the location "C:\Files" to the symbol path, then I can use the debugger command .sympath+ C:\Files. The command .symfix+ adds the URL for the Microsoft public symbol server if is not on the path. After changing the symbol search path, I always use the .reload command to force the debugger to reload all symbols.

When the debugger broke in, it displayed the state of the CPU registers, and the instruction that was being executed. The line t1!main() indicates the name of the executing module and the C function that was being executed, while mov byte ptr [eax], 7Bh indicates the faulting assembler instruction. The eax register currently contains a value of zero, so the program triggered an access violation by attempting to write into memory address zero.

So far, the debugger has provided me with useful information about what went wrong before I have even issued any commands. Example 3 shows a few debugger commands typically used after a break. The first command, the Display Stack Backtrace command k, shows the call stack for the current thread. The Toggle Source Line Support command .lines causes the debugger to switch between showing line numbers and not showing them in the output of future commands. Issuing the k command again gives a stack, but this time it references the relevant source file and line number for the code for which it has private symbols. The call stack shows that the AV occurred near line 4 in source file t1.c. The Display Local Variables command dv shows the value of variable p as 0x00000000. The last command in Example 3 is the Display Type command dt p, which shows the address in which the local variable p is stored, and that p is of type char *. If a variable references a C struct data type or a C++ class, then the dt command also attempts to display the names and values for each field.

0:000> k
ChildEBP RetAddr
0012ff80 004010de t1!main+0xe
0012ffc0 7c816fd7 t1!mainCRTStartup+0xb4
0012fff0 00000000 kernel32!BaseProcessStart+0x23
0:000> .lines
Line number information will be loaded
0:000> k
ChildEBP RetAddr
0012ff80 004010de t1!main+0xe [t1.c @ 4]
0012ffc0 7c816fd7 t1!mainCRTStartup+0xb4
0012fff0 00000000 kernel32!BaseProcessStart+0x23
0:000> dv
              p = 0x00000000 ""
0:000> dt p
Local var @ 0x12ff7c Type char*
(null)

Example 3: Results of several basic ntsd commands.

Arguably the most useful feature of the ntsd debugger is the !analyze debugger extension. In general, this is one of the first commands I issue after a crash because it often does a lot of the initial debugging work for me. Example 4 (available at www.ddj.com/code/) shows the results of using !analyze -v where the -v option triggers verbose output. This command extension analyzes, summarizes, and displays the cause of the break. In this case, the debugger has access to full symbolic debugging information; therefore, !analyze also shows a snippet of the original source code, with an arrow pointing at the line where the break occurred.

Application Verifier

Another freely available (and useful) Microsoft debugging tool is Application Verifier (www.microsoft.com/downloads). AppVerif effectively provides a verification layer between a running application and the operating system. This layer performs sanity checks on the data passed via Windows API calls, and also how the returned data is used. Common issues highlighted by using AppVerif include heap overruns, double freeing of heap allocations, memory leaks, using uninitialized critical sections, and double initialization of critical sections. Another useful feature is Low Resource Simulation, which simulates an application environment with low resources, such as when the system is under stress. Typically, this helps spot places in code where memory allocations are incorrectly assumed to always succeed.

Consider an example demonstrating the heap-checking feature of AppVerif. The C program in Example 5(a) allocates a 16-byte array and then fills the array using a for loop. However, the for loop contains a design error that results in a 1-byte buffer overrun. Despite this, the program compiles and runs successfully. However, when I configure AppVerif to verify this program and rerun it under the debugger, the debugger breaks in when it detects the AV caused by the buffer overrun. Once in the debugger, the stack command k gets the stack trace; see Example 5(b).

(a)
// t2.c
void main(void)
{
    int i = 0;
    char *p = 0;
    p = (char *)malloc(16);
    for (i=0; i<=16; i++)
    {
        // Buffer overrun when i=16
        p[i] = (char)('a' + i);
    }
}

(b)
0:000> k
ChildEBP RetAddr
0012ff80 004011d6 t2!main+0x45 [t2.c @ 11]
0012ffc0 7c816fd7 t2!mainCRTStartup+0xb4
0012fff0 00000000 kernel32!BaseProcessStart+0x2

Example 5: (a) C program designed to cause a 1-byte buffer overrun; (b) Stack trace when (a) causes a buffer overrun.

To see the values of the local variables at the moment of the AV, I use the dv command, as in Example 6. This shows that the value of the index variable i is 16, and that the value of variable p is stored in memory address 0x0209eff0 and points to an array containing at least 16 characters. In this case, the value of p is displayed as a quoted string; however, this does not necessarily indicate that p points to a NULL-terminated string. One way to view the memory referenced by p is to issue the Display Memory as double-word values command dd. Example 6 also shows the results of the dd 0x0209eff0 command, which implies that the memory after the 16 bytes in the buffer corresponds to invalid memory (indicated by "?" characters). It also shows that the address immediately after these 16 bytes is 0x0209f000, which is the start of a new 4k block. To confirm whether memory is invalid, I can use the !address command as in Example 7. The results of !address 0x0209eff0 shows that the buffer referenced by p is PAGE_READWRITE memory, which is valid. The results of the command !address 0x0209f000 shows that the memory immediately following the 16-byte buffer is PAGE_NOACCESS memory. Writing into this PAGE_NOACCESS memory resulted in the AV.

0:000> dv
              i = 16
              p = 0x0209eff0 "abcdefghijklmnop"
0:000> dd 0x0209eff0
0209eff0  64636261 68676665 6c6b6a69 706f6e6d
0209f000  ???????? ???????? ???????? ????????
0209f010  ???????? ???????? ???????? ????????
0209f020  ???????? ???????? ???????? ????????

Example 6: Results of the dv and dd commands after overrunning the buffer p.

0:000> !address 0x0209eff0
    02040000 : 0209e000 - 00001000
       Type     00020000 MEM_PRIVATE
       Protect  00000004 PAGE_READWRITE
       State    00001000 MEM_COMMIT
       Usage    RegionUsagePageHeap
       Handle   02041000
0:000> !address 0x0209f000
    02040000 : 0209f000 - 000a1000
       Type     00020000 MEM_PRIVATE
       Protect  00000001 PAGE_NOACCESS
       State    00001000 MEM_COMMIT
       Usage    RegionUsagePageHeap
       Handle   02041000

Example 7: Results of the !address command contrasting memory that is PAGE_READWRITE with memory that is PAGE_NOACCESS.

When the code called malloc to allocate 16 bytes from the heap, AppVerif intercepted the call and allocated an entire valid block followed by an entire invalid block, and (here's the clever part) returned a pointer to only the last 16 bytes of the valid block. The result is that not only did our code attempt to step beyond the allocated buffer, but it also attempted to step into invalid memory. This attempt to access invalid memory triggered an exception that caused the debugger to break in. That is, AppVerif did not by itself find the buffer overrun. It instead set up the buffer so that Windows would trigger an exception when the buffer overrun occurred.

Application Verifier Limitations

I designed the program in Example 5(a) to specifically use a 16-byte buffer to be sure that the buffer overrun was detected. That is, an apparent limitation of AppVerif revealed through experimentation is that when it intercepts an allocation, it returns a buffer with a start address that is 16-byte aligned. This means that if the code were to request only a 10-byte buffer, then AppVerif would not return a pointer to the last 10 bytes of the valid block. Instead, it would return a pointer to the last 16 bytes of the valid block. The remaining 6 bytes in the block are still valid memory and, unfortunately, the application can happily overrun the 10-byte buffer by up to 6 bytes despite being verified by AppVerif.

So, to automatically detect 1-byte overruns using AppVerif, I need to write wrappers around the malloc and free functions. Example 8 shows examples of suitable wrapper functions called AvrfAlloc and AvrfFree. AvrfAlloc ensures that it always asks malloc for a buffer with a size that is a multiple of 16 bytes. However, it then moves the pointer so that it is exactly the requested number of bytes before the end of the buffer. It also uses a single byte just before the start of the returned buffer to store the offset in bytes from the previous 16-byte alignment so that AvrfFree can later move the buffer pointer back before freeing it. AvrfAlloc effectively pushes the end of the returned buffer to be right before the next 16-byte boundary. Now when the application is being verified by AppVerif, the 1-byte overrun will always be into an inaccessible page of memory.

void *AvrfAlloc(size_t cb)
{
   char *pch = NULL;
   char delta = 0;

   cb++;
   delta = cb & 16;
   if (0 != delta)
   {
      delta = 16 - delta;
      cb += delta;
   }
   pch = malloc(cb);
   if (NULL != pch)
   {
      pch += delta;
      *pch++ = delta + 1;
   }
   return(pch);
}
void AvrfFree(void *pv)
{
   char offset = *((char*)pv - 1);
   free((char*)pv - offset);
}

Example 8: Wrapper functions for malloc and free that compensate for the 16-byte alignment imposed by AppVerif.

Another coding method that helps spot buffer overruns regardless of the presence of AppVerif is to write code that simulates what AppVerif is doing here, but without its 16-byte buffer alignment restriction. Listing One (available electronically; see "Resource Center," page 5) contains the source code for two functions called MyAlloc and MyFree that do just this, and can replace malloc and free. The MyAlloc function calls the Windows Memory Management function VirtualAlloc to allocate enough pages to hold the requested buffer, plus an additional page called the guard page (one page is typically 4096 bytes). This means that each call to MyAlloc allocates at least two pages. It then calls VirtualProtect to mark the guard page as inaccessible. Finally, it returns a pointer to an address located exactly at the originally requested number of bytes before the beginning of the guard page. The corresponding MyFree function simply rounds the pointer value down to the nearest page boundary before passing it to VirtualFree. As with AppVerif, this uses a lot of extra memory, and can noticeably slow down the application.

Debugging Customer Crashes

Debugging crashes is fairly straightforward when you have physical access to the machine. However, if the crash occurs at customer sites, then you often cannot get access to the customer's machine. In these cases, I ask the customer to configure the built-in Dr. Watson for Windows tool to automatically generate a crash dump file when the access violation occurs. The customer then sends me the crash dump file so that I can load it into ntsd using the command ntsd -z user.dmp.

The Dr. Watson tool works okay, but in some cases, a customer's machine may not generate the required Dr. Watson crash dumps. As a work around, it is worthwhile to prepare a short note for the customer describing how to obtain a crash dump using ntsd.exe (or perhaps just point them to this article!). In the case where an application is crashing and the customer knows how to reproduce the crash (in Microsoft parlance, "the customer has a repro"), they can obtain a crash dump by running ntsd.exe from the command line, passing the application command-line as an argument (ntsd -g MyApp.exe, for instance). The customer then performs their repro and, when the crash occurs, the debugger breaks in. A crash dump can be generated using a Create Dump File command similar to .dump /m C:\Files\user.dmp. Similarly, to get a crash dump for a hung application, the customer can attach the ntsd debugger to the running application by specifying the process identifier for the application (for example, ntsd.exe -p <pid>). The debugger breaks into the application, and the customer can obtain a crash dump as previously described.

Conclusion

I've given here an overview of the Microsoft NT Symbolic Debugger (ntsd) and Microsoft Application Verifier, along with some useful code samples. For further information about the truly enormous array of ntsd debugger commands, I recommend reading the debugger.chm help file that ships with the Microsoft Debugging Tools for Windows package.