Monitoring NT Debug Services

By Jose Flores, February 01, 2000

February 2000/Monitoring NT Debug Services

Many programmers are aware of Win32’s support for debugging via documented functions, such as DebugActiveProcess(), WaitForDebugEvent(), etc. But NT also offers another, lesser-known set of native debugging services. Through this second interface, NT provides a wealth of diagnostic information about the system’s doings and offers developers a method for providing similar diagnostic information at all times.

Unfortunately, you can’t normally view information from the native debugging services without a kernel debugger, and that usually requires you to either debug over a serial connection from a second machine (the traditional Microsoft solution) or purchase a third-party debugger like SoftICE. Because both of these options are overkill for developers who just want to view this diagnostic information in real-time, this article presents a method for hooking NT’s debugger services in order to gather this information while displaying it in an easy-to-use program called DbgTrap. As an added bonus, this article demonstrates the necessary techniques required to allow you to hook an arbitrary interrupt vector under NT.

Overview of the Native Debug Service

Although most probably don’t realize it, the majority of developers have been indirect clients of a subset of NT’s native debug services through Win32’s familiar OutputDebugString(). Likewise, DbgPrint() is used regularly by device-driver developers. OutputDebugString() is really just a wrapper around the user-mode version of DbgPrint() found in ntdll.dll. The purpose of having a wrapper around DbgPrint() in the form of OutputDebugString() is twofold: first, in true Win32 spirit, it allows an abstraction from NT-specific services that allow the same binary to run on NT and 9x, and secondly, it lets applications link against either a Unicode or ASCII version of OutputDebugString(), while DbgPrint() provides only an ASCII interface. Internally, NT uses DbgPrint() extensively to provide tracing information on everything from internal errors translating ntstatus values to current reference counts on DLLs.

NT provides three other services in addition to the trace service. Of these, only DbgPrompt() is not an informational service. DbgPrompt() is used by Microsoft’s system debugger to request input from the user. The other two services provide notification on system-image loads and unloads. DbgLoadImageSymbols() notifies the kernel debugger that a system image (usually, but not necessarily, a driver) is being mapped into system space. Its counterpart, DbgUnLoadImageSymbols(), notifies the kernel debugger that an image is being unloaded from the system.

These two notification services are extremely important because of when they are called. System-image load notifications are issued within MmLoadSystemImage() (which maps the image into system space), but before the system calls DriverEntry() for the driver being loaded. This gives you a chance to hook and possibly modify driver entry points at runtime.

Native Debug Service Internals

You invoke NT’s native debug services much like NT’s native I/O services. All debug services eventually resolve down into the kernel, and parameters to individual service routines are passed via the current CPU’s registers. Like native I/O services, debugging service requests are issued to the kernel via a system trap. While the regular NT services use an INT 2E to transfer control to the service routine, the debugging service uses INT 2D. As you would expect, NT sets the protection attribute for the interrupt descriptor to allow both kernel- and user-mode code to issue the INT 2D.

With the debug service-trap primitive in place, all of the above mentioned services (including DbgPrint()) eventually resolve down into a call to an internal NT routine named DebugService(). Before issuing this call, each service pushes a unique service code onto the stack representing the function to be carried out by the kernel. DebugService() then transfers parameters passed on the stack into the EAX, ECX, and EDX registers. This is done because when a service request originates in user mode, a stack transition occurs when control is transferred to kernel mode. This saves the debug service handler the trouble of digging back to find the user-mode stack and the parameters that would lie there. Figure 1 illustrates a typical service request originating in user-mode code.

Hooking Debug Services

With the normal Win32 debugging API, you can attach a process and receive debugging notifications, such as when the process loads a DLL or calls OutputDebugString(). However, this approach only gives you information about the processes that you explicitly attach; you would have to attach every running process to get system-wide information. Also, this interface does not inform you when system images are loaded. I want access to all the notifications that are available.

Other traditional techniques include patching the import table for modules that link against ntdll.dll’s DbgPrint() or forcing a patch into the routines in question, which executes a JMP instruction transferring control to the monitoring routine. Patching the import table is unacceptable because you would miss all information that originates in ntdll.dll or toskrnl.exe, including the system image load notifications. Both techniques are cumbersome in that they need to actively watch for new modules and/or processes to patch. While the JMP patch method would at least make it possible to monitor all debugging services available, it would require essentially writing a dissasembler in order to avoid having the JMP patch end up in the middle of an instruction.

Because of the severe limitations and complications of these techniques, hooking the debugging service interrupt vector seems like a very attractive solution. By hooking the vector directly, I have only one entry point to worry about, and I can monitor all services in both user and kernel mode with minimal effort. Additionally, I avoid a serious headache on Windows 2000 (W2K). W2K implements something called write protection. W2K marks all code sections for system images with read-only and executable protection attributes. If you try to patch code, it will blow up immediately. You could work around this by turning off all write protection system-wide by setting
HKEY_LOCAL_MACHINE
  \SYSTEM
    \CurrentControlSet
      \Control
        \Session Manager
          \Memory Management
            \EnforceWriteProtection
to 0. This defeats one facet of system security, but more importantly if you are a device-driver developer, not having this feature available can hinder your debugging effort for your own drivers. Hooking INT 2D has none of these problems, but does produce a platform-specific tool. With Compaq’s recent decision (at the time of this article’s writing) to dump support for the Alpha on NT, having x86-specific NT code for a development tool is practically a non-issue.

DbgTrap Implementation Overview and Initialization

Hooking the interrupt vector requires kernel-mode code. Therefore the majority of my DbgTrap application is implemented as a kernel-mode driver. The GUI is implemented primarily by the MFC app wizard and displays the events captured by the driver sequentially. The GUI is also responsible for controlling the driver’s behavior and managing options. The application and the driver communicate via five IOCTLs, accessible via Win32’s DeviceIoControl(). These IOCTLs control whether the driver should hook, unhook, capture user-mode events, capture kernel events, or reset the event buffer.

The kernel-mode driver handles the event buffer. The driver source resides mainly in two files. entrypoi.cpp (Listing 1) provides standard driver initialization and cleanup, as well as event-buffer allocation and management. debugser.cpp (Listing 2) contains the code that actually hooks and unhooks the debugging service vector, as well as code to insert events into the event buffer. Because debugging services are invoked relatively infrequently even on the busiest system, the event buffer is implemented as a circular queue.

In addition to all the standard driver initialization and object creation, the DbgTrap driver contains code to allocate the space for the event buffer and its associated header. The header serves to provide versioning information to the GUI, but most importantly contains the current index into the event buffer, as well as the total size of the event buffer. Both the buffer’s header and the buffer itself are mapped into user space when the GUI sends the hook IOCTL. Figure 2 describes the complete buffer layout.

Application Access to the Event Buffer

Once the hook request is received, the driver completes initialization in two final stages. First it creates an MDL describing the pages that the event buffer is contained in and maps these into the application’s user-mode space by calling MmMapLockedPages(). At this point, both the application and the driver can access the buffer simultaneously. However, before the newly created virtual address is returned to the user, the driver needs to store away the application’s virtual address of the buffer on a per-process basis in order to be able to unmap the buffer later from the correct process. Failure to unmap memory from user space before the process terminates will result in the dreaded blue screen of death, with a stop code of PROCESS_HAS_LOCKED_PAGES. The driver uses a small trick to accomplish this and avoid blue-screening.

Because IOCTLs are actually targeted at device objects and not drivers, NT passes a pointer to a DEVICE_OBJECT structure to a driver’s IRP_MJ_DISPATCH routine to differentiate between potentially multiple device objects per driver. Additionally, each DEVICE_OBJECT opened via a call to CreateFile() has a FILE_OBJECT associated with it. A pointer to this file object is also passed inside the IRP describing the IRP_MJ_DISPATCH request. The file object structure contains several unused fields. Two of these are specifically available for device-driver developers to use as they please. The FsContext and FsContext2 fields are offered as additional storage space for developers. The DbgTrap driver exploits this and stores a pointer to the application’s view of the buffer here. As a result, when a IRP_MJ_CLOSE request arrives, all the driver has to do is traverse the device and file objects to determine if the handle to the device that’s about to go away has a mapped view that must be unmapped before the close can be allowed to proceed.

Hooking an ISR under NT

The final stage of initialization is the hooking of the INT 2D. The majority of the code to hook the 2d interrupt vector is found in idt.h (Listing 3). This file contains the definition of a structure and implementation of several associated convenience routines. This structure has a dual purpose. The first purpose is to act as a placeholder for a standard x86 Interrupt Descriptor Table entry, whose format is shown in Figure 3. The important fields are the high and low offset entries. These tell the processor where to transfer control to when this interrupt vector is fired. The second purpose of the structure is to manage hooking of a particular IDT entry. This is accomplished simply by saving the original fields of the IDT entry and replacing them with new values causing control to be transferred to a custom routine upon interruption. It’s very important to take measures to ensure that no interrupts fire in the middle of modifying the IDT entry. To ensure this, the hooking code first raises the IRQL to the highest level and then disables interrupts.

Because of NT’s support for up to 32 processors, the driver has to execute this hooking routine on every processor in the system. While the prototyped exported variable KeNumberProcessors reveals how many CPUs are in the system, there is no documented way to force immediate, synchronous execution of a block of code on a CPU other than the current processor. To let the hooking code execute in a reasonably timely fashion on all processors without synchronization nightmares, the driver uses an undocumented function to set the currently executing thread’s affinity mask. KeSetAffinityThread() forces an immediate context switch if the current processor does not fall in the newly set affinity mask and does not return to the caller until the thread is rescheduled on a processor conforming to the new affinity mask. KeSetAffinityThread() takes two parameters: the first being a pointer to a PKTHREAD structure, and the second being an affinity mask for that thread. For every processor in the system, I first set the current thread’s affinity to a single processor and then call the hooking code.

Handling the Interrupt

Handling of the debug service interrupt is the meat of the DbgTrap project and is accomplished in DTDebuggerTrap(). There are five parts to handling the interrupt: preserving the current processor state precisely, determining if the code should attempt to handle the service request at all, setting up the expected standard NT environment, logging the service request, and chaining to the original INT 2D handler.

To avoid the potentially disastrous injection of random pushes and pops from compiler-generated function epilog and prolog code, the DTDebuggerTrap() handler is declared with _declspec(naked) linkage. This instructs the compiler to not set up a stack frame, not to save any registers, and not to generate a return instruction. This puts the responsibility of saving all registers modified on the handler. Ultimately, this lets the code chain to the original handler with the exact same context as when control was original transferred to DTDebuggerTrap() by the interrupt.

Preserving the current processor state and deciding whether to handle the interrupt requires manipulating the processor’s selectors and flags registers, which contain enough information to decide whether NT is executing in kernel or user mode. DbgTrap exploits this fact by using these values to determine whether the interrupt originated in kernel space, user space, or in the context of ntvdm (a DOS box). When kernel-mode code, such as this driver, executes and calls NT API functions, it expects a standard environment described by these selectors to be set up. Table 1 shows the values of each selector that NT normally expects. Once this standard NT environment is set up, the parameters passed to the debug service interrupt inside registers are pushed on the stack, and control is transferred to LoggerDispatch().

LoggerDispatch() uses three helper functions to add new events to the buffer. LogEvent() logs general information, such as the time the event occurred, the process name, and process ID that the event occurred in. In contrast, LogDbgPrint() and LogLoadImageSymbols() log specific information depending on whether or not the event originated as a print or an image (un)load notification, respectively. Access to the buffer is serialized by calling KeAcquireSpinLockRaiseToSynch() (in the form of the macro LOCK_BUFFER). Because the interrupt may have been issued at a high IRQL, the standard spin-lock acquisition via KeAcquireSpinLock() is unacceptable here because KeAcquireSpinLock() implicitly sets the IRQL to DISPATCH_LEVEL, regardless of whether the call originated at a higher or lower IRQL.

The Role of the Application

The DbgTrap application (complete source code is in this month’s code archive) plays a passive role for the most part, with the exception of being responsible for initially installing, starting, and stopping the driver upon its invocation. The CDriver class provides wrapper methods to register and start the driver with the Service Control Manager and then communicate with the driver through standard Win32 calls. Once the driver is started, it sits idle, waiting for a timer with a one-second period to expire, and then polls the event buffer to see if any new events have occurred since the last polling. Event data is displayed in a standard listview control. Toggling the trapping options or resetting the event buffer forces an issuing of a DeviceIoControl() describing the request to DbgTrap driver.

Getting More Diagnostic Information from NT

By default, NT produces a reasonable amount of real-time information available to DbgTrap, such as symbol loads, DLL collisions, and errors during error translation. But it’s possible to get NT to display more information. For the most part, this is controlled by a single global flag in the kernel with the uncreative name NtGlobalFlag. Users can control this flag by setting certain bits in the
HKEY_LOCAL_MACHINE
  \SYSTEM
    \CurrentControlSet
      \Control
        \Session Manager
          \GlobalFlag
registry key. The gflags utility provided in the NT Resource Kit allows convenient control of these values via the user interface shown in Figure 4.

Three bits are of particular interest. The “Show Loader Snaps” option forces NT to spit out extremely verbose information on process creation, resolving image dependencies, DLL reference counts, and much more. Setting these registry keys normally requires a reboot to take effect, and they are system granular, affecting all processes when set. However, by using a bit of trickery, you can achieve almost the same result on a per-process basis by loading symbols for ntdll.dll and, under a debugger, setting the ShowSnaps variable to a non-zero value for whatever process you’re interested in. The “Enable Loading of Kernel Debug Symbols” option sends a DbgLoadImageSymbols() notification the first time any user-mode image is loaded, in addition to the usual notification for kernel-mode drivers. Finally, the “buffer DbgPrint” option defers the output of DbgPrint() strings.

Conclusion

Up until now DbgTrap has run in a pass-through mode, allowing all events to pass unmodified to the original debug service handler. Further application of DbgTrap could have the driver eat up print requests based on a certain string pattern. More than one software vendor (you know who you are) has knowingly or unknowingly released modules to customers that clutter their system debuggers or debug service viewers. Thus, a relatively easy modification of the DbgTrap driver could make your debugging life that much easier.

When Jose Flores is not defending his air hockey championship title, he develops kernel tools for NuMega Technologies. You can contact him directly via www.joseflores.com.

Get Source Code

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Monitoring NT Debug Services

Overview of the Native Debug Service

Native Debug Service Internals

Hooking Debug Services

DbgTrap Implementation Overview and Initialization

Application Access to the Event Buffer

Hooking an ISR under NT

Handling the Interrupt

The Role of the Application

Getting More Diagnostic Information from NT

Conclusion

Get Source Code

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Monitoring NT Debug Services

Overview of the Native Debug Service

Native Debug Service Internals

Hooking Debug Services

DbgTrap Implementation Overview and Initialization

Application Access to the Event Buffer

Hooking an ISR under NT

Handling the Interrupt

The Role of the Application

Getting More Diagnostic Information from NT

Conclusion

Get Source Code

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content