Dr. Dobb's | Monitoring NT Debug Services

Monitoring NT Debug Services

February 01, 2000
URL:http://www.drdobbs.com/monitoring-nt-debug-services/184416239

February 2000/Monitoring NT Debug Services

Many programmers are aware of Win32’s support for debugging via documented functions, such as DebugActiveProcess(), WaitForDebugEvent(), etc. But NT also offers another, lesser-known set of native debugging services. Through this second interface, NT provides a wealth of diagnostic information about the system’s doings and offers developers a method for providing similar diagnostic information at all times.

Unfortunately, you can’t normally view information from the native debugging services without a kernel debugger, and that usually requires you to either debug over a serial connection from a second machine (the traditional Microsoft solution) or purchase a third-party debugger like SoftICE. Because both of these options are overkill for developers who just want to view this diagnostic information in real-time, this article presents a method for hooking NT’s debugger services in order to gather this information while displaying it in an easy-to-use program called DbgTrap. As an added bonus, this article demonstrates the necessary techniques required to allow you to hook an arbitrary interrupt vector under NT.

Overview of the Native Debug Service

Although most probably don’t realize it, the majority of developers have been indirect clients of a subset of NT’s native debug services through Win32’s familiar OutputDebugString(). Likewise, DbgPrint() is used regularly by device-driver developers. OutputDebugString() is really just a wrapper around the user-mode version of DbgPrint() found in ntdll.dll. The purpose of having a wrapper around DbgPrint() in the form of OutputDebugString() is twofold: first, in true Win32 spirit, it allows an abstraction from NT-specific services that allow the same binary to run on NT and 9x, and secondly, it lets applications link against either a Unicode or ASCII version of OutputDebugString(), while DbgPrint() provides only an ASCII interface. Internally, NT uses DbgPrint() extensively to provide tracing information on everything from internal errors translating ntstatus values to current reference counts on DLLs.

NT provides three other services in addition to the trace service. Of these, only DbgPrompt() is not an informational service. DbgPrompt() is used by Microsoft’s system debugger to request input from the user. The other two services provide notification on system-image loads and unloads. DbgLoadImageSymbols() notifies the kernel debugger that a system image (usually, but not necessarily, a driver) is being mapped into system space. Its counterpart, DbgUnLoadImageSymbols(), notifies the kernel debugger that an image is being unloaded from the system.

These two notification services are extremely important because of when they are called. System-image load notifications are issued within MmLoadSystemImage() (which maps the image into system space), but before the system calls DriverEntry() for the driver being loaded. This gives you a chance to hook and possibly modify driver entry points at runtime.

Native Debug Service Internals

You invoke NT’s native debug services much like NT’s native I/O services. All debug services eventually resolve down into the kernel, and parameters to individual service routines are passed via the current CPU’s registers. Like native I/O services, debugging service requests are issued to the kernel via a system trap. While the regular NT services use an INT 2E to transfer control to the service routine, the debugging service uses INT 2D. As you would expect, NT sets the protection attribute for the interrupt descriptor to allow both kernel- and user-mode code to issue the INT 2D.

With the debug service-trap primitive in place, all of the above mentioned services (including DbgPrint()) eventually resolve down into a call to an internal NT routine named DebugService(). Before issuing this call, each service pushes a unique service code onto the stack representing the function to be carried out by the kernel. DebugService() then transfers parameters passed on the stack into the EAX, ECX, and EDX registers. This is done because when a service request originates in user mode, a stack transition occurs when control is transferred to kernel mode. This saves the debug service handler the trouble of digging back to find the user-mode stack and the parameters that would lie there. Figure 1 illustrates a typical service request originating in user-mode code.

Hooking Debug Services

With the normal Win32 debugging API, you can attach a process and receive debugging notifications, such as when the process loads a DLL or calls OutputDebugString(). However, this approach only gives you information about the processes that you explicitly attach; you would have to attach every running process to get system-wide information. Also, this interface does not inform you when system images are loaded. I want access to all the notifications that are available.

Other traditional techniques include patching the import table for modules that link against ntdll.dll’s DbgPrint() or forcing a patch into the routines in question, which executes a JMP instruction transferring control to the monitoring routine. Patching the import table is unacceptable because you would miss all information that originates in ntdll.dll or toskrnl.exe, including the system image load notifications. Both techniques are cumbersome in that they need to actively watch for new modules and/or processes to patch. While the JMP patch method would at least make it possible to monitor all debugging services available, it would require essentially writing a dissasembler in order to avoid having the JMP patch end up in the middle of an instruction.

Because of the severe limitations and complications of these techniques, hooking the debugging service interrupt vector seems like a very attractive solution. By hooking the vector directly, I have only one entry point to worry about, and I can monitor all services in both user and kernel mode with minimal effort. Additionally, I avoid a serious headache on Windows 2000 (W2K). W2K implements something called write protection. W2K marks all code sections for system images with read-only and executable protection attributes. If you try to patch code, it will blow up immediately. You could work around this by turning off all write protection system-wide by setting
HKEY_LOCAL_MACHINE
  \SYSTEM
    \CurrentControlSet
      \Control
        \Session Manager
          \Memory Management
            \EnforceWriteProtection
to 0. This defeats one facet of system security, but more importantly if you are a device-driver developer, not having this feature available can hinder your debugging effort for your own drivers. Hooking INT 2D has none of these problems, but does produce a platform-specific tool. With Compaq’s recent decision (at the time of this article’s writing) to dump support for the Alpha on NT, having x86-specific NT code for a development tool is practically a non-issue.

DbgTrap Implementation Overview and Initialization

Hooking the interrupt vector requires kernel-mode code. Therefore the majority of my DbgTrap application is implemented as a kernel-mode driver. The GUI is implemented primarily by the MFC app wizard and displays the events captured by the driver sequentially. The GUI is also responsible for controlling the driver’s behavior and managing options. The application and the driver communicate via five IOCTLs, accessible via Win32’s DeviceIoControl(). These IOCTLs control whether the driver should hook, unhook, capture user-mode events, capture kernel events, or reset the event buffer.

The kernel-mode driver handles the event buffer. The driver source resides mainly in two files. entrypoi.cpp (Listing 1) provides standard driver initialization and cleanup, as well as event-buffer allocation and management. debugser.cpp (Listing 2) contains the code that actually hooks and unhooks the debugging service vector, as well as code to insert events into the event buffer. Because debugging services are invoked relatively infrequently even on the busiest system, the event buffer is implemented as a circular queue.

In addition to all the standard driver initialization and object creation, the DbgTrap driver contains code to allocate the space for the event buffer and its associated header. The header serves to provide versioning information to the GUI, but most importantly contains the current index into the event buffer, as well as the total size of the event buffer. Both the buffer’s header and the buffer itself are mapped into user space when the GUI sends the hook IOCTL. Figure 2 describes the complete buffer layout.

Application Access to the Event Buffer

Once the hook request is received, the driver completes initialization in two final stages. First it creates an MDL describing the pages that the event buffer is contained in and maps these into the application’s user-mode space by calling MmMapLockedPages(). At this point, both the application and the driver can access the buffer simultaneously. However, before the newly created virtual address is returned to the user, the driver needs to store away the application’s virtual address of the buffer on a per-process basis in order to be able to unmap the buffer later from the correct process. Failure to unmap memory from user space before the process terminates will result in the dreaded blue screen of death, with a stop code of PROCESS_HAS_LOCKED_PAGES. The driver uses a small trick to accomplish this and avoid blue-screening.

Because IOCTLs are actually targeted at device objects and not drivers, NT passes a pointer to a DEVICE_OBJECT structure to a driver’s IRP_MJ_DISPATCH routine to differentiate between potentially multiple device objects per driver. Additionally, each DEVICE_OBJECT opened via a call to CreateFile() has a FILE_OBJECT associated with it. A pointer to this file object is also passed inside the IRP describing the IRP_MJ_DISPATCH request. The file object structure contains several unused fields. Two of these are specifically available for device-driver developers to use as they please. The FsContext and FsContext2 fields are offered as additional storage space for developers. The DbgTrap driver exploits this and stores a pointer to the application’s view of the buffer here. As a result, when a IRP_MJ_CLOSE request arrives, all the driver has to do is traverse the device and file objects to determine if the handle to the device that’s about to go away has a mapped view that must be unmapped before the close can be allowed to proceed.

Hooking an ISR under NT

The final stage of initialization is the hooking of the INT 2D. The majority of the code to hook the 2d interrupt vector is found in idt.h (Listing 3). This file contains the definition of a structure and implementation of several associated convenience routines. This structure has a dual purpose. The first purpose is to act as a placeholder for a standard x86 Interrupt Descriptor Table entry, whose format is shown in Figure 3. The important fields are the high and low offset entries. These tell the processor where to transfer control to when this interrupt vector is fired. The second purpose of the structure is to manage hooking of a particular IDT entry. This is accomplished simply by saving the original fields of the IDT entry and replacing them with new values causing control to be transferred to a custom routine upon interruption. It’s very important to take measures to ensure that no interrupts fire in the middle of modifying the IDT entry. To ensure this, the hooking code first raises the IRQL to the highest level and then disables interrupts.

Because of NT’s support for up to 32 processors, the driver has to execute this hooking routine on every processor in the system. While the prototyped exported variable KeNumberProcessors reveals how many CPUs are in the system, there is no documented way to force immediate, synchronous execution of a block of code on a CPU other than the current processor. To let the hooking code execute in a reasonably timely fashion on all processors without synchronization nightmares, the driver uses an undocumented function to set the currently executing thread’s affinity mask. KeSetAffinityThread() forces an immediate context switch if the current processor does not fall in the newly set affinity mask and does not return to the caller until the thread is rescheduled on a processor conforming to the new affinity mask. KeSetAffinityThread() takes two parameters: the first being a pointer to a PKTHREAD structure, and the second being an affinity mask for that thread. For every processor in the system, I first set the current thread’s affinity to a single processor and then call the hooking code.

Handling the Interrupt

Handling of the debug service interrupt is the meat of the DbgTrap project and is accomplished in DTDebuggerTrap(). There are five parts to handling the interrupt: preserving the current processor state precisely, determining if the code should attempt to handle the service request at all, setting up the expected standard NT environment, logging the service request, and chaining to the original INT 2D handler.

To avoid the potentially disastrous injection of random pushes and pops from compiler-generated function epilog and prolog code, the DTDebuggerTrap() handler is declared with _declspec(naked) linkage. This instructs the compiler to not set up a stack frame, not to save any registers, and not to generate a return instruction. This puts the responsibility of saving all registers modified on the handler. Ultimately, this lets the code chain to the original handler with the exact same context as when control was original transferred to DTDebuggerTrap() by the interrupt.

Preserving the current processor state and deciding whether to handle the interrupt requires manipulating the processor’s selectors and flags registers, which contain enough information to decide whether NT is executing in kernel or user mode. DbgTrap exploits this fact by using these values to determine whether the interrupt originated in kernel space, user space, or in the context of ntvdm (a DOS box). When kernel-mode code, such as this driver, executes and calls NT API functions, it expects a standard environment described by these selectors to be set up. Table 1 shows the values of each selector that NT normally expects. Once this standard NT environment is set up, the parameters passed to the debug service interrupt inside registers are pushed on the stack, and control is transferred to LoggerDispatch().

LoggerDispatch() uses three helper functions to add new events to the buffer. LogEvent() logs general information, such as the time the event occurred, the process name, and process ID that the event occurred in. In contrast, LogDbgPrint() and LogLoadImageSymbols() log specific information depending on whether or not the event originated as a print or an image (un)load notification, respectively. Access to the buffer is serialized by calling KeAcquireSpinLockRaiseToSynch() (in the form of the macro LOCK_BUFFER). Because the interrupt may have been issued at a high IRQL, the standard spin-lock acquisition via KeAcquireSpinLock() is unacceptable here because KeAcquireSpinLock() implicitly sets the IRQL to DISPATCH_LEVEL, regardless of whether the call originated at a higher or lower IRQL.

The Role of the Application

The DbgTrap application (complete source code is in this month’s code archive) plays a passive role for the most part, with the exception of being responsible for initially installing, starting, and stopping the driver upon its invocation. The CDriver class provides wrapper methods to register and start the driver with the Service Control Manager and then communicate with the driver through standard Win32 calls. Once the driver is started, it sits idle, waiting for a timer with a one-second period to expire, and then polls the event buffer to see if any new events have occurred since the last polling. Event data is displayed in a standard listview control. Toggling the trapping options or resetting the event buffer forces an issuing of a DeviceIoControl() describing the request to DbgTrap driver.

Getting More Diagnostic Information from NT

By default, NT produces a reasonable amount of real-time information available to DbgTrap, such as symbol loads, DLL collisions, and errors during error translation. But it’s possible to get NT to display more information. For the most part, this is controlled by a single global flag in the kernel with the uncreative name NtGlobalFlag. Users can control this flag by setting certain bits in the
HKEY_LOCAL_MACHINE
  \SYSTEM
    \CurrentControlSet
      \Control
        \Session Manager
          \GlobalFlag
registry key. The gflags utility provided in the NT Resource Kit allows convenient control of these values via the user interface shown in Figure 4.

Three bits are of particular interest. The “Show Loader Snaps” option forces NT to spit out extremely verbose information on process creation, resolving image dependencies, DLL reference counts, and much more. Setting these registry keys normally requires a reboot to take effect, and they are system granular, affecting all processes when set. However, by using a bit of trickery, you can achieve almost the same result on a per-process basis by loading symbols for ntdll.dll and, under a debugger, setting the ShowSnaps variable to a non-zero value for whatever process you’re interested in. The “Enable Loading of Kernel Debug Symbols” option sends a DbgLoadImageSymbols() notification the first time any user-mode image is loaded, in addition to the usual notification for kernel-mode drivers. Finally, the “buffer DbgPrint” option defers the output of DbgPrint() strings.

Conclusion

Up until now DbgTrap has run in a pass-through mode, allowing all events to pass unmodified to the original debug service handler. Further application of DbgTrap could have the driver eat up print requests based on a certain string pattern. More than one software vendor (you know who you are) has knowingly or unknowingly released modules to customers that clutter their system debuggers or debug service viewers. Thus, a relatively easy modification of the DbgTrap driver could make your debugging life that much easier.

When Jose Flores is not defending his air hockey championship title, he develops kernel tools for NuMega Technologies. You can contact him directly via www.joseflores.com.

Get Source Code

February 2000/Monitoring NT Debug Services/Figure 1

Figure 1: Typical debug service request

February 2000/Monitoring NT Debug Services/Figure 2

Figure 2: Event buffer layout (shared with user)

February 2000/Monitoring NT Debug Services/Figure 3

Figure 3: Format of x86 IDT entry

February 2000/Monitoring NT Debug Services/Figure 4

Figure 4: Global Flags allows control

February 2000/Monitoring NT Debug Services/Listing 1

Listing 1: entrypoi.cpp — Driver entry points and buffer management

#pragma warning( disable : 4201 4514 4060)
#include "DbgTrapProcs.h"

#define NT_DEVICE_NAME      L"\\Device\\DbgTrap"
#define DOS_DEVICE_NAME     L"\\DosDevices\\DbgTrap"

// DRIVER_STATE is used by CleanUp function to determine work
// to be done. As the driver completes it's initialization we
// increment it's state.  If an error occurs during at any time
// Cleanup does all work undoing what we've done up to that point.
typedef enum DRIVER_STATE
{
    STATE_INITIAL,          // We own no resources
    STATE_HASBUFFER,        // We've allocated our event buffer
    STATE_HASDEVICES,       // We've created a device object
    STATE_HASLINK,          // We've created a symbolic link
    STATE_MONITORING        // We're hooked into the debugger service
};

DRIVER_STATE    state;  // See DRIVER_STATE discussion above
MDL             mdl;    // MDL describing the GUI's view of eb
extern "C" NTSTATUS DriverEntry(
    IN PDRIVER_OBJECT DriverObject,
    IN PUNICODE_STRING RegistryPath  );

// Moves our cheezy state machine into the next state
inline DRIVER_STATE NextState()
{
    //_asm    lock inc state
    InterlockedIncrement( (PLONG)&state );
    return state;
}

// Moves our cheezy state machine into the previous state
inline DRIVER_STATE PreviousState()
{
    //_asm    lock inc state
    InterlockedDecrement( (PLONG)&state );
    return state;
}

// Prepares our driver to be unloaded. Can be called from anywhere
// not just unload routine. Any fatal errors will call this before
// spiriling downward to doom.
void Cleanup(PDRIVER_OBJECT DriverObject)
{
    UNICODE_STRING uniWin32NameString;
   
    DTPrint( ("Cleanup called!!" ) );
    switch ( state )
    {
        case STATE_MONITORING:
            InstallDebugServiceHook( FALSE );
        case STATE_HASLINK:
            // Delete the link from our device name to a
            // name in the Win32 namespace. 
            RtlInitUnicodeString( &uniWin32NameString,
                                   DOS_DEVICE_NAME );
            IoDeleteSymbolicLink( &uniWin32NameString );
        case STATE_HASDEVICES:
            // Finally delete our device object
            IoDeleteDevice( DriverObject->DeviceObject );
        case STATE_HASBUFFER:
            // Free up our trapper resources
            BringDownTrapper();
        case STATE_INITIAL:
            state = STATE_INITIAL;
            break;
    }
}

// generically checks basic things about potential buffer
NTSTATUS ValidateBuffer(PVOID pBuffer,          // potential buffer
        DWORD length,           // length this buffer claims to be
        DWORD requiredLength    // length this buffer *must* be
        )
{
    if ( pBuffer == NULL )
        return STATUS_INVALID_PARAMETER;
    if ( length < requiredLength )
        return STATUS_INFO_LENGTH_MISMATCH;
    return STATUS_SUCCESS;
}

// We keep a pointer to the ring 3 buffer for each open file handle 
// to our device that has been given access to our event buffer.
// This function sets that buffer value into the file object that
// originated the current IRP
void SetGuiView(IN PIRP     pIrp, IN PVOID    value)
{
    PFILE_OBJECT pFile = pIrp->Tail.Overlay.OriginalFileObject;
    pFile->FsContext = (PVOID)value;
}

// We keep a pointer to the ring 3 buffer for each open file handle
// to our device that has been given access to our event buffer.
// This function plucks that buffer value out of the file object
// indirectly, via the current IRP.
PVOID GetGuiView(IN PIRP pIrp)
{
    PFILE_OBJECT pFile = pIrp->Tail.Overlay.OriginalFileObject;
    return pFile->FsContext;
}

    PMDL        pMdl;

// We graciously share our event buffer with the GUI. It's up to
// the GUI to behave properly with the event buffer. Since our 
// device is exclusive & I am writing both the GUI & the driver,
// there is no reason to think I'd write the driver any better
// than I'd write the GUI...so there's no problem ;-)
NTSTATUS MapGuiView(IN OUT  PVOID   outputBuffer,
    IN      DWORD   outputBufferLength,  IN      PIRP    pIrp)
{
    NTSTATUS    status = STATUS_UNSUCCESSFUL;
    // Anything to do?
    if ( GetGuiView(pIrp) != NULL)
        return STATUS_SUCCESS;
    // Do we have good parameters?
    status = ValidateBuffer( outputBuffer,
                outputBufferLength, sizeof(DWORD) );
    if ( status != STATUS_SUCCESS )
        return status;
    _try
    {        
        DWORD size = MmSizeOfMdl(pHeader,bufferSize);
        pMdl = (PMDL)ExAllocatePool( NonPagedPool, size );
        pMdl = MmCreateMdl( pMdl, pHeader, bufferSize);
        if ( (pMdl->MdlFlags & (MDL_PAGES_LOCKED               |
                                MDL_SOURCE_IS_NONPAGED_POOL    |
                                MDL_MAPPED_TO_SYSTEM_VA        |
                                MDL_PARTIAL) ) == 0)
            MmBuildMdlForNonPagedPool(pMdl);
        PVOID ptr = MmMapLockedPages(pMdl, UserMode);
        if ( ptr == NULL )
            *(PDBGTRAP_HEADER*)outputBuffer = NULL;
        else
            *(PDBGTRAP_HEADER*)outputBuffer =
             (PDBGTRAP_HEADER)(ULONG(ptr)|MmGetMdlByteOffset(pMdl));
    }
    __except (EXCEPTION_EXECUTE_HANDLER)
    {
        return STATUS_ACCESS_VIOLATION;
    }
    return status;
}

// Does nothing if there's no buffer view associated with the current
// file object. Otherwise, we get rid of the user mode view of our
// event buffer.
void UnmapGuiView(PIRP   pIrp)
{
    // Pull the pointer out from the FileObject
    PVOID   pGuiView = GetGuiView( pIrp );
    if ( pGuiView == NULL )
        return;
    // Now unmap em
    MmUnmapLockedPages((PVOID)((ULONG)pGuiView&~(PAGE_SIZE-1)),pMdl);
    // do some stuff..mmunmaplockedpages()
    SetGuiView( pIrp, NULL );
}

// Handler for all IRP_MJ_CLOSE
NTSTATUS DTClose(IN PDEVICE_OBJECT pDO, IN PIRP pIrp)
{
    // Make sure that we unmap the ring 3 view of our buffer if 
    // necessary
    UnmapGuiView( pIrp );
    // Just complete the IRP normally
    pIrp->IoStatus.Status = STATUS_SUCCESS; 
    IoCompleteRequest(pIrp, IO_NO_INCREMENT ); 
    return STATUS_SUCCESS; 
}

// Handler for IRP_MJ_DEVICE_CONTROL & IRP_MJ_CREATE
NTSTATUS DTDispatch(IN PDEVICE_OBJECT pDO,IN PIRP pIrp)
{
    PVOID           inputBuffer         = NULL;
    PVOID           outputBuffer        = NULL;
    DWORD           inputBufferLength   = 0;
    DWORD           outputBufferLength  = 0;
    NTSTATUS        status              = STATUS_NOT_IMPLEMENTED;
    PIO_STACK_LOCATION pIrpStack=IoGetCurrentIrpStackLocation(pIrp);

    // Get the pointer to the input/output buffer and it's length
    inputBuffer        = pIrpStack->Parameters.DeviceIoControl
                        .Type3InputBuffer;
    outputBuffer       = pIrp->UserBuffer;
    inputBufferLength  = pIrpStack->Parameters.DeviceIoControl
                        .InputBufferLength;
    outputBufferLength = pIrpStack->Parameters.DeviceIoControl
                        .OutputBufferLength;
    pIrp->IoStatus.Information = 0; 
    // Dispatch based on major fcn code. 
    switch (pIrpStack->MajorFunction) 
    { 
        case IRP_MJ_CREATE: 
            SetGuiView( pIrp, NULL );
            status = STATUS_SUCCESS; 
            break; 
        case IRP_MJ_DEVICE_CONTROL: 
            //  Dispatch on IOCTL 
            switch (pIrpStack->Parameters.DeviceIoControl
                    .IoControlCode)
            { 
                case IOCTL_HOOK:
                    if ( InstallDebugServiceHook( TRUE ) )
                    {
                        NextState();
                        status = MapGuiView(  outputBuffer,
                              outputBufferLength, pIrp);
                        if ( status == STATUS_SUCCESS )
                            SetGuiView( pIrp, (PVOID)*
                                (PDWORD)outputBuffer );
                    }                        
                    break;
                case IOCTL_UNHOOK:
                    UnmapGuiView( pIrp );
                    if ( InstallDebugServiceHook( FALSE ) )
                        PreviousState();
                    status = STATUS_SUCCESS;
                    break;
                case IOCTL_RESET:
                    ResetBuffer();
                    status = STATUS_SUCCESS;
                    break;
                case IOCTL_LOGUSER:
                    status = ValidateBuffer( inputBuffer, 
                             inputBufferLength, 1);
                    if ( status == STATUS_SUCCESS )
                        bLogUser = *(PBOOL)inputBuffer;
                    break;
                case IOCTL_LOGKERNEL:
                    status = ValidateBuffer( inputBuffer, 
                             inputBufferLength, 1);
                    if ( status == STATUS_SUCCESS )
                        bLogKernel = *(PBOOL)inputBuffer;
                    break;
                default:
                    // ?? Where did this come from ??
                    status = STATUS_NOT_IMPLEMENTED; 
                    break;
            }
            break; 
        default:
            // ?? Where did this come from ??
            status = STATUS_NOT_IMPLEMENTED; 
            pIrp->IoStatus.Information = 0; 
            break;
    } 

    // We're done with I/O request.  Record the status of the
    // I/O action. 
    pIrp->IoStatus.Status = status; 
    // Don't boost priority when returning since this took
    // little time. 
    IoCompleteRequest(pIrp, IO_NO_INCREMENT ); 
    return status; 
} 

VOID DTUnload(IN PDRIVER_OBJECT DriverObject)
{    
    DTPrint( ("Unloading!!" ) );
    Cleanup( DriverObject );
}

#pragma code_seg("INIT")
NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject,
    IN PUNICODE_STRING RegistryPath)
{
    PDEVICE_OBJECT deviceObject = NULL;
    NTSTATUS status;
    UNICODE_STRING uniNtNameString;
    UNICODE_STRING uniWin32NameString;
    
    DTPrint( ("DriverEntry called - Debug Build!") );  
    // Initialize globals
    state = STATE_INITIAL;
    // Initialize the buffer
    status = InitTrapper();
    if ( !NT_SUCCESS(status) ) 
    {
        DTPrint( ("Couldn't initialize debug trap subsystem") );
        Cleanup( DriverObject );
        return status;
    }
    NextState();
    // Create the device object
    RtlInitUnicodeString( &uniNtNameString, NT_DEVICE_NAME );
    status = IoCreateDevice(DriverObject,
                 0,                     // We don't use a
                                        // device extension
                 &uniNtNameString, FILE_DEVICE_UNKNOWN,
                 0,                     // No standard device
                                        // characteristics
                 TRUE,                  // This IS an exclusive
                                        // device
                 &deviceObject
                 );
    if ( !NT_SUCCESS(status) )
    {
        DTPrint( ("Couldn't create the device") );
        Cleanup( DriverObject );
        return status;
    }
    NextState();
    // Set up our dispatch fncs
    DriverObject->MajorFunction[IRP_MJ_CREATE]  = DTDispatch;
    DriverObject->MajorFunction[IRP_MJ_CLOSE]   = DTClose;
    DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DTDispatch;
    DriverObject->DriverUnload = DTUnload;
    // Create a symbolic link for the GUI
    RtlInitUnicodeString( &uniWin32NameString, DOS_DEVICE_NAME );
    status = IoCreateSymbolicLink( 
                            &uniWin32NameString, &
                            uniNtNameString 
                            );
    if (!NT_SUCCESS(status))
    {
        DTPrint( ("Couldn't create the symbolic link") );
        Cleanup( DriverObject );
        return status;
    }
    NextState();        
    return status;
}
#pragma code_seg()
//End of File

February 2000/Monitoring NT Debug Services/Listing 2

Listing 2: debugser.cpp — Interrupt-handling code

#include "DbgTrapProcs.h"
#include "idt.h"

typedef struct DBG_PRINT_PARAMS
{
    DWORD   s;
    char*   pString;    // string to DbgPrint
    DWORD   u;
    PVOID   caller;     // Return address of who invoked DbgPrint()
} DBG_PRINT_PARAMS, *PDBG_PRINT_PARAMS;


DWORD   debugServiceVector = 0x2D;    // Interrupt vector used by NT
BOOL    bWeHooked = FALSE; // Are we hooked yet..with some soul
NT_IDT  oldIDTE; // Original IDT entry for debug service we replace
PVOID   oldISR;   // Original ISR for Debugger service
BOOL    bLogUser    = TRUE;
BOOL    bLogKernel  = TRUE;
PDBGTRAP_HEADER pHeader;    // Header for giving info to the GUI
PDBGTRAP_EVENT  peb;        // Our event buffer
DWORD   maxBufferSize = 256*1024;// Max Size of our event buffer (eb)
DWORD   bufferSize;         // Actual # of bytes allocated for eb
DWORD   numEntries;         // Max entries in the event buffer
KSPIN_LOCK      bl;         // Buffer Lock
PDWORD   pindex;            // Pointer to current index in buffer
DWORD    imgNameOfs;        // Offset in KPEB for image name

// NOTE: these must be used in the same scope
#define LOCK_BUFFER KIRQL __oldirql__ = \
    KeAcquireSpinLockRaiseToSynch( &bl );
#define UNLOCK_BUFFER   KeReleaseSpinLock( & bl, __oldirql__ );

void ResetBuffer()
{
    LOCK_BUFFER
    memset( peb, 0, bufferSize );
    *pindex = 0;
    UNLOCK_BUFFER 
}

NTSTATUS InitTrapper()
{
    // Allocate a buffer that holds exactly N events, plus the header
    numEntries = (maxBufferSize-sizeof(DBGTRAP_HEADER))
                 /sizeof(DBGTRAP_EVENT);
    bufferSize = numEntries*sizeof(DBGTRAP_EVENT)
                 +sizeof(DBGTRAP_HEADER);
    pHeader = (PDBGTRAP_HEADER)
              ExAllocatePool(NonPagedPool, bufferSize);
    if ( pHeader == FALSE)
        return STATUS_INSUFFICIENT_RESOURCES;
    // Initialize the header
    pHeader->size       = sizeof(DBGTRAP_HEADER);
    pHeader->version    = DT_VERSION;
    pHeader->sig        = 'JOSE';  // Shameless, I know...
    pHeader->numEntries = numEntries;
    pHeader->index      = 0;
    // Initialize the buffer & counter
    peb     = &pHeader->eb[0];
    pindex  = &pHeader->index;
    // Initialize the buffer lock
    KeInitializeSpinLock( &bl );
    // Set vars correctly
    DWORD mv;
    PsGetVersion( &mv, NULL, NULL, NULL );
    if ( mv == 5)
         imgNameOfs = 0x1fc;
    else
       imgNameOfs = 0x1DC;

    ResetBuffer();
    return STATUS_SUCCESS;
}

// Free up our trapper resources
void BringDownTrapper()
{
    // Since there is no safe way to syncronize freeing the buffer
    // with buffer logging, or debug service interrupt hook *must*
    // have been removed prior to calling here.
    ExFreePool( pHeader );
}

// Logs parameters specific to a DbgPrint() call
void LogDbgPrint(PDBGTRAP_EVENT pEvent, PDBG_PRINT_PARAMS param)
{
   pEvent->callingAddr  = param->caller;

   DWORD l = strlen( param->pString );
   l = (l>MAX_STRING_LENGTH) ? MAX_STRING_LENGTH-1 : l;
   
   memcpy( pEvent->string, param->pString, l );
   pEvent->string[ MAX_STRING_LENGTH-1 ] = 0;
}

// Logs parameters specific to Dbg(Un)LoadImageSymbols()
void LogLoadImageSymbols(PDBGTRAP_EVENT    pEvent,
      PANSI_STRING      name,
      PVOID*            baseAddr  )
{
   char n[256];
   char buff[sizeof(n)+10];

   pEvent->callingAddr = baseAddr[5];
   memcpy( n, name->Buffer, name->Length );
   n[name->Length] = '\0';
   DWORD l = sprintf( buff, "Driver %s @ 0x%x", n, *baseAddr );
   l = (l>MAX_STRING_LENGTH) ? MAX_STRING_LENGTH-1 : l;
   memcpy( pEvent->string, buff, l );
   pEvent->string[ MAX_STRING_LENGTH-1 ] = 0;
}

// Logs general parameters common to all debug services
void LogEvent(PDBGTRAP_EVENT pEvent, NtDebuggerService service)
{
    PEPROCESS pep = 0;
    pEvent->count = *pindex;
    KeQuerySystemTime( &pEvent->time );
    pEvent->pid = PsGetCurrentProcessId();
    pep = PsGetCurrentProcess();
    strcpy( pEvent->procName, &((CHAR*)pep)[imgNameOfs] );
    pEvent->action = service;
}

void LoggerDispatch(NtDebuggerService service, PVOID paramsCX,
      PVOID paramsDX)
{
   LOCK_BUFFER
   // see if we're going to overrun  the buffer
   if ( *pindex >= numEntries )
      *pindex = 0;
   PDBGTRAP_EVENT pCurrentEvent = &peb[*pindex];
   switch (service)
   {
      case DS_PRINT:
         LogDbgPrint( pCurrentEvent, (PDBG_PRINT_PARAMS)paramsCX );
           break;
      case DS_PROMPT:
           break;
      case DS_LOADSYMBOLS:
      case DS_UNLOADSYMBOLS:
         LogLoadImageSymbols( pCurrentEvent, 
                (PANSI_STRING)paramsCX, (PVOID*)paramsDX);
           break;
   }
   LogEvent( pCurrentEvent, service );
   ++*pindex;
   UNLOCK_BUFFER
}

void _declspec(naked) DTDebuggerTrap()
{
    static NtDebuggerService   service;    // Which debugger service
                                           // is requested
    static DWORD               paramsCX;   // Parameters passed to
                                           // the service
    static DWORD               paramsDX;
    _asm
    {  
        // Check & see if there's anything to do
        test  [esp+8], X86_VM // VDM  doesn't have Dbg services
        jnz   DT_CHAIN
        test [esp+4], X86_USER
        jz  DT_CAMEFROMKERNEL
        cmp [esp+4], NT_UCS   // Don't log if user selector != 1b
        jnz DT_CHAIN
        cmp BYTE PTR ss:[bLogUser], 0 // Don't log of GUI says not to
        jz  DT_CHAIN
        jmp DT_LOG           // Passed all tests, log user event
DT_CAMEFROMKERNEL:
        cmp ss:[bLogKernel], 0
        jz  DT_CHAIN
        // First rule of tinkering: 
        //      "Save all the pieces"
        // So let's Set up a semi-standard environment
DT_LOG:
        pushad
        push    ds
        push    es
        push    gs
        push    fs
        mov     bx, NT_DS
        mov     ds, bx
        mov     es, bx
        mov     gs, bx
        mov     bx, NT_FS
        mov     fs, bx
        push   edx
        push   ecx
        push   eax
        call   LoggerDispatch
        // Restore everything
        pop     fs
        pop     gs
        pop     es
        pop     ds
        popad
        // Chain to original ISR
DT_CHAIN:
        jmp     cs:[oldISR]
    }
}

// This guy actually installs the hook on the debug service vector.
// Returns TRUE if we actually did anything, FALSE if not. 
// NOTE: This routine (like others in this driver) is NOT thread 
// safe. Our whole subsystem currently expects a single
// controler-single reader (but can have mutliple writers(cpus)). 
// This routine should therefore only be called inside a dispatch 
// routine from the GUI, or in the context of the system's worker 
// thred via DriverEntry, or Unload.
BOOL InstallDebugServiceHook(BOOL bHook)
{
    PNT_IDT pidtBase = NULL;

    // Anything to do?
    if ( bWeHooked == bHook )
        return FALSE;        
    // Loop  over all processors & hook the debug service vector
    for ( char cpu=0; cpu<*KeNumberProcessors; ++cpu )
    {
       PKTHREAD pThread = KeGetCurrentThread();
       KeSetAffinityThread( pThread, 1<<cpu );
       // Get this processor's IDT base address
       pidtBase = GetIDTBase();
       if ( bHook )
           pidtBase[ debugServiceVector ].Hook( DTDebuggerTrap, &oldISR );
       else
           pidtBase[ debugServiceVector ].Hook( oldISR );   
    }

    // Set the thread's affinity back to run on all processors
    PKTHREAD pThread = KeGetCurrentThread();
    KeSetAffinityThread( pThread, (1<<*KeNumberProcessors)-1 );
    // No turning back now...all or nothing
    bWeHooked = !bWeHooked;  

    return TRUE;
}
//End of File

February 2000/Monitoring NT Debug Services/Listing 3

Listing 3: idt.h — Code to hook interrupt

#ifndef _IDT_H_
#define _IDT_H_

#define    DISABLE_INTS    KIRQL __Dioldirql__; \
                        KeRaiseIrql( HIGH_LEVEL, &__Dioldirql__ ); \
                        _asm    { pushfd    }     \
                        _asm    { cli    }     
                        
#define    ENABLE_INTS        {_asm    popfd    }     \
                        KeLowerIrql(__Dioldirql__);

// NT Uses the following values for it's 32-bit flat selectors
#define NT_CS   (0x8)
#define NT_UCS  (0x1B)
#define NT_DS   (0x23)
#define NT_FS   (0x30)

// x86 specific constants used to TEST where an interrupt originated
#define  X86_VM   (0x20000) // V8086 mode
#define  X86_USER (0x1)     // Selector with bit 1 or 2
                            // set => user mode

#pragma pack( push, PREIDT )
typedef struct NT_IDT
{
    WORD    wLoOfs;        // Low Word of the ISR's offset
    WORD    wSelector;     // ISR's selector....should be 8 under NT
    WORD    wFlags;        // Flags...should almost always be 8E00 
                           // for 32-bit code on NT.
    WORD    wHiOfs;        // Hi Word of ISR's offset

    // This hooks an IDT Entry to point to a new offset.
    // It leaves the current protection settings as they
    // are. Additionally, it returns the flat offset to the
    // old ISR handler.  To set completely new protection 
    // flags use Set() below.

    void Hook( PVOID newOfs, PVOID pOldOfs=NULL )
    {        
        DISABLE_INTS

        if ( pOldOfs != NULL )
            *(PDWORD)pOldOfs =    (wHiOfs<<16) + wLoOfs ;

        wLoOfs    =    WORD(newOfs) ;
        wHiOfs    =    WORD(((DWORD)newOfs)>>16) ;

        ENABLE_INTS
    }

    // This hooks an IDT Entry to a completely new interupt gate.
    // Unlike NT_IDT::Hook(), this explicitly sets the gate's
    // protection, both at the gate level & implicitly with the
    // code selector passed.
    void Set( PVOID ofs, WORD protection=0x8E00, WORD sel=NT_CS) 
    {
    DISABLE_INTS

    wLoOfs        = WORD(ofs) ;
    wSelector    = sel;
        wFlags        = protection;
        wHiOfs        = WORD(((DWORD)ofs)>>16)  ;

        ENABLE_INTS
    }

    NT_IDT operator=(NT_IDT *r_idt)
    {
        DISABLE_INTS

        wLoOfs        = r_idt->wLoOfs;
        wSelector    = r_idt->wSelector;
        wFlags        = r_idt->wFlags; 
        wHiOfs        = r_idt->wHiOfs;

        ENABLE_INTS

        return *this;
    }

    NT_IDT operator=(NT_IDT r_idt)
    {
        DISABLE_INTS

        wLoOfs        = r_idt.wLoOfs;
        wSelector    = r_idt.wSelector;
        wFlags        = r_idt.wFlags; 
        wHiOfs        = r_idt.wHiOfs;

        ENABLE_INTS

        return *this;
    }

    BOOL IsValid() const
    {
        return ( (wHiOfs) && (wSelector==NT_CS) );
    }

} NT_IDT, *PNT_IDT;
#pragma pack( pop, PREIDT )


#pragma warning( disable : 4035 )    // Turn off no return
                                     // value warning

// Get's the  IDT base for the current processor
inline    PNT_IDT    __fastcall GetIDTBase()
{
    __asm {  mov eax, _PCR KPCR.IDT  }    
}
      
#pragma warning( default : 4035 )

#endif
//End of File

February 2000/Monitoring NT Debug Services/Table 1

Table 1: Selector values expected by NT

February 2000/Monitoring NT Debug Services

Monitoring NT Debug Services

Jose Flores

Many programmers are aware of Win32’s support for debugging via documented functions, such as DebugActiveProcess(), WaitForDebugEvent(), etc. But NT also offers another, lesser-known set of native debugging services. Through this second interface, NT provides a wealth of diagnostic information about the system’s doings and offers developers a method for providing similar diagnostic information at all times.

Unfortunately, you can’t normally view information from the native debugging services without a kernel debugger, and that usually requires you to either debug over a serial connection from a second machine (the traditional Microsoft solution) or purchase a third-party debugger like SoftICE. Because both of these options are overkill for developers who just want to view this diagnostic information in real-time, this article presents a method for hooking NT’s debugger services in order to gather this information while displaying it in an easy-to-use program called DbgTrap. As an added bonus, this article demonstrates the necessary techniques required to allow you to hook an arbitrary interrupt vector under NT.

Overview of the Native Debug Service

Although most probably don’t realize it, the majority of developers have been indirect clients of a subset of NT’s native debug services through Win32’s familiar OutputDebugString(). Likewise, DbgPrint() is used regularly by device-driver developers. OutputDebugString() is really just a wrapper around the user-mode version of DbgPrint() found in ntdll.dll. The purpose of having a wrapper around DbgPrint() in the form of OutputDebugString() is twofold: first, in true Win32 spirit, it allows an abstraction from NT-specific services that allow the same binary to run on NT and 9x, and secondly, it lets applications link against either a Unicode or ASCII version of OutputDebugString(), while DbgPrint() provides only an ASCII interface. Internally, NT uses DbgPrint() extensively to provide tracing information on everything from internal errors translating ntstatus values to current reference counts on DLLs.

NT provides three other services in addition to the trace service. Of these, only DbgPrompt() is not an informational service. DbgPrompt() is used by Microsoft’s system debugger to request input from the user. The other two services provide notification on system-image loads and unloads. DbgLoadImageSymbols() notifies the kernel debugger that a system image (usually, but not necessarily, a driver) is being mapped into system space. Its counterpart, DbgUnLoadImageSymbols(), notifies the kernel debugger that an image is being unloaded from the system.

These two notification services are extremely important because of when they are called. System-image load notifications are issued within MmLoadSystemImage() (which maps the image into system space), but before the system calls DriverEntry() for the driver being loaded. This gives you a chance to hook and possibly modify driver entry points at runtime.

Native Debug Service Internals

You invoke NT’s native debug services much like NT’s native I/O services. All debug services eventually resolve down into the kernel, and parameters to individual service routines are passed via the current CPU’s registers. Like native I/O services, debugging service requests are issued to the kernel via a system trap. While the regular NT services use an INT 2E to transfer control to the service routine, the debugging service uses INT 2D. As you would expect, NT sets the protection attribute for the interrupt descriptor to allow both kernel- and user-mode code to issue the INT 2D.

With the debug service-trap primitive in place, all of the above mentioned services (including DbgPrint()) eventually resolve down into a call to an internal NT routine named DebugService(). Before issuing this call, each service pushes a unique service code onto the stack representing the function to be carried out by the kernel. DebugService() then transfers parameters passed on the stack into the EAX, ECX, and EDX registers. This is done because when a service request originates in user mode, a stack transition occurs when control is transferred to kernel mode. This saves the debug service handler the trouble of digging back to find the user-mode stack and the parameters that would lie there. Figure 1 illustrates a typical service request originating in user-mode code.

Hooking Debug Services

With the normal Win32 debugging API, you can attach a process and receive debugging notifications, such as when the process loads a DLL or calls OutputDebugString(). However, this approach only gives you information about the processes that you explicitly attach; you would have to attach every running process to get system-wide information. Also, this interface does not inform you when system images are loaded. I want access to all the notifications that are available.

Other traditional techniques include patching the import table for modules that link against ntdll.dll’s DbgPrint() or forcing a patch into the routines in question, which executes a JMP instruction transferring control to the monitoring routine. Patching the import table is unacceptable because you would miss all information that originates in ntdll.dll or toskrnl.exe, including the system image load notifications. Both techniques are cumbersome in that they need to actively watch for new modules and/or processes to patch. While the JMP patch method would at least make it possible to monitor all debugging services available, it would require essentially writing a dissasembler in order to avoid having the JMP patch end up in the middle of an instruction.

Because of the severe limitations and complications of these techniques, hooking the debugging service interrupt vector seems like a very attractive solution. By hooking the vector directly, I have only one entry point to worry about, and I can monitor all services in both user and kernel mode with minimal effort. Additionally, I avoid a serious headache on Windows 2000 (W2K). W2K implements something called write protection. W2K marks all code sections for system images with read-only and executable protection attributes. If you try to patch code, it will blow up immediately. You could work around this by turning off all write protection system-wide by setting
HKEY_LOCAL_MACHINE
  \SYSTEM
    \CurrentControlSet
      \Control
        \Session Manager
          \Memory Management
            \EnforceWriteProtection
to 0. This defeats one facet of system security, but more importantly if you are a device-driver developer, not having this feature available can hinder your debugging effort for your own drivers. Hooking INT 2D has none of these problems, but does produce a platform-specific tool. With Compaq’s recent decision (at the time of this article’s writing) to dump support for the Alpha on NT, having x86-specific NT code for a development tool is practically a non-issue.

DbgTrap Implementation Overview and Initialization

Hooking the interrupt vector requires kernel-mode code. Therefore the majority of my DbgTrap application is implemented as a kernel-mode driver. The GUI is implemented primarily by the MFC app wizard and displays the events captured by the driver sequentially. The GUI is also responsible for controlling the driver’s behavior and managing options. The application and the driver communicate via five IOCTLs, accessible via Win32’s DeviceIoControl(). These IOCTLs control whether the driver should hook, unhook, capture user-mode events, capture kernel events, or reset the event buffer.

The kernel-mode driver handles the event buffer. The driver source resides mainly in two files. entrypoi.cpp (Listing 1) provides standard driver initialization and cleanup, as well as event-buffer allocation and management. debugser.cpp (Listing 2) contains the code that actually hooks and unhooks the debugging service vector, as well as code to insert events into the event buffer. Because debugging services are invoked relatively infrequently even on the busiest system, the event buffer is implemented as a circular queue.

In addition to all the standard driver initialization and object creation, the DbgTrap driver contains code to allocate the space for the event buffer and its associated header. The header serves to provide versioning information to the GUI, but most importantly contains the current index into the event buffer, as well as the total size of the event buffer. Both the buffer’s header and the buffer itself are mapped into user space when the GUI sends the hook IOCTL. Figure 2 describes the complete buffer layout.

Application Access to the Event Buffer

Once the hook request is received, the driver completes initialization in two final stages. First it creates an MDL describing the pages that the event buffer is contained in and maps these into the application’s user-mode space by calling MmMapLockedPages(). At this point, both the application and the driver can access the buffer simultaneously. However, before the newly created virtual address is returned to the user, the driver needs to store away the application’s virtual address of the buffer on a per-process basis in order to be able to unmap the buffer later from the correct process. Failure to unmap memory from user space before the process terminates will result in the dreaded blue screen of death, with a stop code of PROCESS_HAS_LOCKED_PAGES. The driver uses a small trick to accomplish this and avoid blue-screening.

Because IOCTLs are actually targeted at device objects and not drivers, NT passes a pointer to a DEVICE_OBJECT structure to a driver’s IRP_MJ_DISPATCH routine to differentiate between potentially multiple device objects per driver. Additionally, each DEVICE_OBJECT opened via a call to CreateFile() has a FILE_OBJECT associated with it. A pointer to this file object is also passed inside the IRP describing the IRP_MJ_DISPATCH request. The file object structure contains several unused fields. Two of these are specifically available for device-driver developers to use as they please. The FsContext and FsContext2 fields are offered as additional storage space for developers. The DbgTrap driver exploits this and stores a pointer to the application’s view of the buffer here. As a result, when a IRP_MJ_CLOSE request arrives, all the driver has to do is traverse the device and file objects to determine if the handle to the device that’s about to go away has a mapped view that must be unmapped before the close can be allowed to proceed.

Hooking an ISR under NT

The final stage of initialization is the hooking of the INT 2D. The majority of the code to hook the 2d interrupt vector is found in idt.h (Listing 3). This file contains the definition of a structure and implementation of several associated convenience routines. This structure has a dual purpose. The first purpose is to act as a placeholder for a standard x86 Interrupt Descriptor Table entry, whose format is shown in Figure 3. The important fields are the high and low offset entries. These tell the processor where to transfer control to when this interrupt vector is fired. The second purpose of the structure is to manage hooking of a particular IDT entry. This is accomplished simply by saving the original fields of the IDT entry and replacing them with new values causing control to be transferred to a custom routine upon interruption. It’s very important to take measures to ensure that no interrupts fire in the middle of modifying the IDT entry. To ensure this, the hooking code first raises the IRQL to the highest level and then disables interrupts.

Because of NT’s support for up to 32 processors, the driver has to execute this hooking routine on every processor in the system. While the prototyped exported variable KeNumberProcessors reveals how many CPUs are in the system, there is no documented way to force immediate, synchronous execution of a block of code on a CPU other than the current processor. To let the hooking code execute in a reasonably timely fashion on all processors without synchronization nightmares, the driver uses an undocumented function to set the currently executing thread’s affinity mask. KeSetAffinityThread() forces an immediate context switch if the current processor does not fall in the newly set affinity mask and does not return to the caller until the thread is rescheduled on a processor conforming to the new affinity mask. KeSetAffinityThread() takes two parameters: the first being a pointer to a PKTHREAD structure, and the second being an affinity mask for that thread. For every processor in the system, I first set the current thread’s affinity to a single processor and then call the hooking code.

Handling the Interrupt

Handling of the debug service interrupt is the meat of the DbgTrap project and is accomplished in DTDebuggerTrap(). There are five parts to handling the interrupt: preserving the current processor state precisely, determining if the code should attempt to handle the service request at all, setting up the expected standard NT environment, logging the service request, and chaining to the original INT 2D handler.

To avoid the potentially disastrous injection of random pushes and pops from compiler-generated function epilog and prolog code, the DTDebuggerTrap() handler is declared with _declspec(naked) linkage. This instructs the compiler to not set up a stack frame, not to save any registers, and not to generate a return instruction. This puts the responsibility of saving all registers modified on the handler. Ultimately, this lets the code chain to the original handler with the exact same context as when control was original transferred to DTDebuggerTrap() by the interrupt.

Preserving the current processor state and deciding whether to handle the interrupt requires manipulating the processor’s selectors and flags registers, which contain enough information to decide whether NT is executing in kernel or user mode. DbgTrap exploits this fact by using these values to determine whether the interrupt originated in kernel space, user space, or in the context of ntvdm (a DOS box). When kernel-mode code, such as this driver, executes and calls NT API functions, it expects a standard environment described by these selectors to be set up. Table 1 shows the values of each selector that NT normally expects. Once this standard NT environment is set up, the parameters passed to the debug service interrupt inside registers are pushed on the stack, and control is transferred to LoggerDispatch().

LoggerDispatch() uses three helper functions to add new events to the buffer. LogEvent() logs general information, such as the time the event occurred, the process name, and process ID that the event occurred in. In contrast, LogDbgPrint() and LogLoadImageSymbols() log specific information depending on whether or not the event originated as a print or an image (un)load notification, respectively. Access to the buffer is serialized by calling KeAcquireSpinLockRaiseToSynch() (in the form of the macro LOCK_BUFFER). Because the interrupt may have been issued at a high IRQL, the standard spin-lock acquisition via KeAcquireSpinLock() is unacceptable here because KeAcquireSpinLock() implicitly sets the IRQL to DISPATCH_LEVEL, regardless of whether the call originated at a higher or lower IRQL.

The Role of the Application

The DbgTrap application (complete source code is in this month’s code archive) plays a passive role for the most part, with the exception of being responsible for initially installing, starting, and stopping the driver upon its invocation. The CDriver class provides wrapper methods to register and start the driver with the Service Control Manager and then communicate with the driver through standard Win32 calls. Once the driver is started, it sits idle, waiting for a timer with a one-second period to expire, and then polls the event buffer to see if any new events have occurred since the last polling. Event data is displayed in a standard listview control. Toggling the trapping options or resetting the event buffer forces an issuing of a DeviceIoControl() describing the request to DbgTrap driver.

Getting More Diagnostic Information from NT

By default, NT produces a reasonable amount of real-time information available to DbgTrap, such as symbol loads, DLL collisions, and errors during error translation. But it’s possible to get NT to display more information. For the most part, this is controlled by a single global flag in the kernel with the uncreative name NtGlobalFlag. Users can control this flag by setting certain bits in the
HKEY_LOCAL_MACHINE
  \SYSTEM
    \CurrentControlSet
      \Control
        \Session Manager
          \GlobalFlag
registry key. The gflags utility provided in the NT Resource Kit allows convenient control of these values via the user interface shown in Figure 4.

Three bits are of particular interest. The “Show Loader Snaps” option forces NT to spit out extremely verbose information on process creation, resolving image dependencies, DLL reference counts, and much more. Setting these registry keys normally requires a reboot to take effect, and they are system granular, affecting all processes when set. However, by using a bit of trickery, you can achieve almost the same result on a per-process basis by loading symbols for ntdll.dll and, under a debugger, setting the ShowSnaps variable to a non-zero value for whatever process you’re interested in. The “Enable Loading of Kernel Debug Symbols” option sends a DbgLoadImageSymbols() notification the first time any user-mode image is loaded, in addition to the usual notification for kernel-mode drivers. Finally, the “buffer DbgPrint” option defers the output of DbgPrint() strings.

Conclusion

Up until now DbgTrap has run in a pass-through mode, allowing all events to pass unmodified to the original debug service handler. Further application of DbgTrap could have the driver eat up print requests based on a certain string pattern. More than one software vendor (you know who you are) has knowingly or unknowingly released modules to customers that clutter their system debuggers or debug service viewers. Thus, a relatively easy modification of the DbgTrap driver could make your debugging life that much easier.

When Jose Flores is not defending his air hockey championship title, he develops kernel tools for NuMega Technologies. You can contact him directly via www.joseflores.com.

Get Source Code