Solving application and driver debugging problems
Dmitri is a consultant in the Silicon Valley specializing in embedded-system integration and driver/application development. He can be reached at email@example.com.
Debugging software on Windows CE is more difficult than debugging on desktop Windows, in part because of the constrained nature of embedded devices. However, the lack of basic debugging tools and techniques for Windows CE also plays a role. Powerful tools such as API spies, file monitors, registry monitors, and debug message monitorsindispensable in Windows desktop developmentsimply aren't available for CE.
Consequently, in this article, I present Spy, an API spy for Windows CE. The source code that implements Spy (available electronically; see "Resource Center," page 5) includes the Spy engine and an example that intercepts three system functions: CreateProcess, CreateFile, and LoadLibrary. You can, of course, modify the example to spy for hundreds of other API routines for monitoring registry access, thread synchronization, Windows messages, GDI routines, and the like. (I've also developed several commercially available monitors, such as Registry Monitor and Thread Monitor; see http://www.forwardlab.com/.)
Beware that Spy relies on undocumented CE features and internal data structures that can lead to undesired system behavior. Still, it is a valuable tool for debugging, troubleshooting, and system exploration, and has been tested on Windows CE 3.0 and 4.x (on the CEPC platform), Pocket PC 2002 and 4.x emulator, and several off-the-shelf Pocket PC 2000 and 2002 PDAs.
Microsoft's eMbedded Visual Tools 3.0 provide a development environment for building applications for Windows CE devices, including the Pocket PC and Smartphone. The toolset includes eMbedded Visual C++ 3.0, eMbedded Visual Basic 3.0., and SDKs for Pocket PC 2002 and Smartphone 2002 (see http://msdn.microsoft.com/vstudio/device/ embedded/default.aspx). For Windows CE .NET 4.x development programmers typically use eMbedded Visual Tools 4.0 with .NET Standard SDK. The SDK contains help and header files that document CE's Win32 API subset. However, the SDK help file provides a limited description of the CE architecture; for instance, the "Processes" section (under "Working with Processes and Threads") briefly describes the 33 virtual memory slots that house up to 32 processes, with slot 0 used to map current process. But the SDK does not explain the implications of the slot-based model, in terms of pointer mapping, interprocess memory access, and interprocess calls.
With Platform Builder (the Microsoft IDE for building small-footprint embedded devices based on Windows CE .NET), the development environment is richer, and includes the shared source code of the CE kernel. A debug version of the OS, which can be built for PC-based CEPC, lets you use the kernel debugger and kernel traces instead of the underpowered application debugger in the eMbedded Visual Tools.
The "Kernel" section (under "Core OS Services" in the Platform Builder help file) diagrams the OS architecture, and the "Kernel Functions" section lists a number of API routinesMapPtrToProcess, SetProcPermissions, SetKMode, GetCallerProcess, and GetOwnerProcessfor dealing with processes and memory slots. You may also find this information on MSDN online. According to the description of GetCallerProcess, CE threads can migrate between processes while executing API callswhich is big news if you're used to desktop Windows, where threads are confined to the same process. Therefore, the old GetCurrentProcessId is supplemented with GetCallerProcess and GetOwnerProcess. Also, since API calls do migrate between processes, code routinely needs access to memory addresses in other processes. The SetProcPermissions routine gives the current thread access to any process. Normally, you don't worry about these issues because the system properly arranges everything. On the other hand, hard-to-find bugs can appear because of the process/slot interplay. For example, a driver routine may pass an accessible pointer from one thread to another, then have the second thread crash because it does not have access to the address space of the application that called the routine on the first thread. Applications can use SetProcPermissions to share memory by, for example, passing a pointer in a window message, but this is an undocumented approach. As explained in "Drivers and Pointer Data" section, drivers sometimes need to call MapPtrToProcess to convert a pointer from slot 0 to the appropriate slot for the caller application.
Even though Platform Builder documentation is revealing about cross-platform calls, access permissions, API traps, and pointer mapping, it is still sketchy. It does not explain the slot-based design, why the current process is always mapped to slot 0, how system APIs work, or how to use them. IsAPIReady is the only documented function for dealing with APIs, and it is frequently called by drivers to make sure that a particular system API is available.
Luckily, in Chapter 3 of John Murray's Inside Microsoft Windows CE (Microsoft Press, 1998), the CE kernel developers explain the design decisions behind the most important features. Slot 0, for instance, is used to optimize DLL loading. When CE loads a DLL, it reserves space for it in every slot (see Figure 1). Code and read-only data sections of a DLL, loaded by several processes, are mapped to the same physical space. Writable data sections are also given the same place in each slot, but they have to be given separate physical pages. Since the code usually has embedded pointers to the data and only a single copy of the code is allowed, these pointers cannot change when switching between processesbut each process should access its own data. This is achieved by directing all pointers to slot 0, where the current process is mapped to. Therefore, when a DLL executes, it automatically accesses private data of the current process. Slot 0 pointers such as these are called "unmapped." The same data (and code) can be also accessed by mapping a pointer to the permanent slot of the given process using MapPtrToProcess routine. This is necessary when passing the pointer to another process; for example, as an argument of an API or a window message. Mapped addresses can be accessed by any thread (in any process) that has permission. Permission is a 32-bit mask with bits set for each slot that should be accessible.
GetProcPermissions/SetProcPermissions routines manipulate the permission mask of the current thread. The Remote Process Viewer tool displays slot addresses of running processes and permissions of threads. The Access Key column in the Process pane displays the permission bit assigned to the process's slot. The Access Key column in the Thread pane displays the permission mask for threads. You can see that most threads have at least two bits set: one for the kernel (nk.exe), and another bit corresponding to the owner process. Other bits may be set to give additional access.
Finally, Murray's book sheds light on the system API mechanism, which was designed as an efficient version of a client/server model. Unlike a traditional client/server model, which has the server running on a separate thread, Windows CE borrows the caller's thread and places it in the server process. Such servers are called "protected server libraries" (PSLs) and are processes that export services like DLLs. The services are exported by registering an API set, which is a table of pointers to methods (functions in the server). The API set also has a name, table of method signatures, server handle, and dispatch type. These signatures are 32-bit masks, where individual bits are set if the correspondent method argument is a pointer. When dispatching an API call, the system maps the pointer arguments from slot 0 to the slot designated for the caller process.
However, to understand how to create and register an API set, enumerate available APIs, find a method serving a particular API, or intercept an API for monitoring, you have to study the header files and shared source code that comes with the CE development environment. The directories I refer to are in the WinCE root. The header PUBLIC\COMMON\OAK\INC\pkfuncs.h defines the layout of some CE kernel data structures and undocumented routines such as CreateAPISet, RegisterAPISet, QueryAPISetID, GetAPIAddress, LocalAllocInProcess, PerformCallBack4, and GetRomFileInfo. PerformCallBack4 is especially powerful because it lets you directly call a routine in another process without registering an API.
KDataStruct is an important kernel structure that can be accessed by applications using the fixed address PUserKData. The value of PUserKData is fixed as 0xFFFFC800 on the ARM processor, and 0x00005800 on other CPUs (in public header kfuncs.h in all versions of CE SDK). The inline versions of Win32 routines GetCurrentThreadId and GetCurrentProcessId use a fixed offset SYSHANDLE_OFFSET from PUserKData to access data directly. Additional offsets are defined in PUBLIC\COMMON\OAK\INC\pkfuncs.h. KINFO_OFFSET gives access to UserKInfo array, which keeps important system data, such as module list, kernel heap, and API set pointer table (SystemAPISets). There are two different types of API sets: implicit and handle based. Implicit API sets are global and their methods can be called using the API handle index and method index pair. Implicit API sets (represented by CINFO structure) are registered in a global table SystemAPISets. Handle-based APIs, on the other hand, are attached to kernel objects such as files, mutexes, events, and the like. Methods in these APIs can be called using an object handle and the method index. SDK header kfuncs.h define handle indexes for implicit API sets such as SH_WIN32, SH_GDI, SH_WMGR, and others. Handle-based API indexes such as HT_EVENT, HT_APISET, and HT_SOCKET are defined in PUBLIC\COMMON\OAK\INC\psyscall.h. At most, 32 API sets can be registered at a time and almost all of them are already in use. Figure 2 provides a summary of these internal data structures. (For more information, search the directory PRIVATE\WINCEOS\COREOS\NK\KERNEL in shared source code for these identifiers.)
The file Psyscall.h declares the macros METHOD_CALL and IMPLICIT_CALL, which produce special addresses that clients can call to invoke API methods. These addresses are called "traps" and are handled by the CE kernel exception dispatcher. Many CE application developers have seen these weird addresses (always starting with 0xF) when stepping into a Win32 API routine in a debugger. You often need to step into an API routine or examine its code in the disassembly window if the API crashes or fails unexpectedly. Unfortunately, almost all Win32 routines are just thin wrappers around traps in the system API. Since application debuggers do not allow stepping into these traps and a kernel debugger is not available for most developers, debugging Win32 API failures on CE is complex. However, after learning about the IMPLICIT_CALL macro and SystemAPISets table, you can decipher trap addresses and find the real handlers of API methods. The formula ((0xFFFFFE00-TRAP)/2),x (or ((0xF0010000-TRAP)/4),x on ARM CPUs) lets you obtain method index (low-order byte) and API index (second byte). Then you use the API index to get pointer to CINFO from SystemAPISets, and the method index to get the method address from the method table. Alternatively, you can use method DumpApis() in CeApiSpy (available electronically) to dump all registered APIs and their methods.
Designing the Spy
As you can see in Figure 2, the only way to intercept an API routine is by replacing a pointer to an API set in the SystemAPISets table. There are a number of other attractive pointers, but all of them are in read-only sections and may be located in ROM. Therefore, Spy allocates a copy of the original CINFO structure (as well as method and signature tables), then replaces the address of the original API in the SystemAPISets table with the address of the new CINFO. Spy then replaces some pointers to individual methods in the duplicate table with pointers to interceptor routines. The interceptor routines call methods from the original method tables to ensure that the system remains operational. Spy also registers its own API to call routines in the Spy process from other processes. The current version of Spy can only intercept implicit (global) APIs. Figure 3 is an example of an intercepted API method.
The difficult parts of Spy's design are ensuring that the interceptors are accessible by API callers and achieving consistency in process and mode transition. Some API methods use GetCallerProcess (to track ownership of resources, for example), other methods check whether the caller runs in user or kernel mode. The Spy must not change the results of these checks. To achieve this, the interceptor part of Spy is packaged in a DLL, which is loaded into all running processes. The interceptor routines are executed in the same process as the original API server. CE 4.0 has a documented way to inject a DLL into all processes using the InjectDLL registry value. Since CE 3.0 does not support it, Spy uses the undocumented function PerformCallBack4 to execute LoadLibrary in other processes.
The final obstacle is to deal with cached API table pointers inside Coredll. It appears that Coredll (the Win32 API provider) has an optimization that obtains pointers (using the undocumented GetRomFileInfo routine) to the kernel Win32 API method tables, then calls these methods directly instead of using traps. Therefore, Spy has to scan Coredll for these cached pointers, then replace them. This has to be done for every running process and for newly loaded processes. Spy also has to intercept the GetRomFileInfo routine to be notified about the starting of new processes.
The Workspace file CeApiSpy.vcw contains the projects CeApiSpy.vcp (for building CeApiSpy.exe GUI) and CeApiSpyDll.vcp (for building CeApiSpyDll.dll, the interceptor DLL). I tested the build with eMbedded Visual C++ 3.0 using Pocket PC 2002 SDK, and CE 4.0 using the Standard SDK. You should select an appropriate build configuration or create a new one for other SDKs. Platform Builder is not necessary to build Spy, but the kernel debugger may be helpful. To eliminate dependency on system headers, I assembled undocumented functions, macros, and data-structure declarations in SysDecls.h.
The most important file is SpyEngine.cpp. The first portion of this file contains replacement routines for the ToolHelp API (missing from the CE 4.x emulation version). The HookCoredll function searches Coredll for pointers to cached system tables, then replaces them. CallCoredllInProc invokes an exported method from Coredll in any process (using PerformCallBack4). The functions AllocateMemInKernelProc and FreeMemInKernelProc allocate and free memory in process #1 (nk.exe). This memory is used to create duplicate API method tables, which are accessible by all processes. LoadHookDllIntoProcess and UnloadHookDllInProcess load and unload the Spy DLL in any process, while LoadHookDllInAllProcesses loads this DLL into all running processes. CreateDuplicateApi calls the AllocateMemInKernelProc and CreateDuplicateMethodTable routines to create a copy of CINFO structure, as well as method and signature tables of a system API. It then writes the address of the new CINFO to the SystemAPISets table. HookMethod is called by custom monitors to intercept a method in a specified API. HookGetRomFileInfo is an interceptor for the undocumented GetRomFileInfo routine, which is always installed by Spy to be notified about the starting of new processes and to substitute pointers to system API tables when they are retrieved by Coredll. StartSpy and StopSpy are exported methods from the Spy DLL used by the GUI to initialize and uninitialize the Spy DLL. StartSpy calls DoStart worker routine. DllMain performs necessary per-process initialization. DumpApis is not used for spying, but simply dumps information about registered APIs.
The file Interceptors.cpp contains custom interceptors and can be modified to intercept additional API methods. InstallInterceptors is called by DoStart and contains calls to HookMethod for intercepted methods. All interceptor routines in Interceptors.cpp have a similar pattern. They print information about the method call and arguments, and call the original API method through a pointer returned earlier from HookMethod. To add a new interceptor, copy one of the existing routines, replace its name and arguments, declare a new pointer to the original method, and add a call to HookMethod.
To print and display the spying results, I used the tracing framework (HTrace.cpp) presented in my article "An Efficient and Flexible Tracing Technique" (C/C++ Users Journal, April 2002), which prints traces to a shared memory buffer as well as a debug monitor. The file TrcView.cpp implements a window, which displays the text in the buffer. The display is updated by a timer and is not coupled with the interceptor routines to minimize overhead and interference with the system.
To test Spy, you can use the MissingDll project (available electronically). It demonstrates how to use Spy to debug application load failure because of a missing DLL. You can build the Spy and MissingDll projects and copy CeApiSpy.exe, CeApiSpyDll.dll, and MissingDllApp.exe to the device. Start CeApiSpy.exe and invoke the Start command, then start MissingDllApp.exe. The system should display the message:
Cannot find MissingDllApp (or one of its components). Make sure the path and filename are correct and that all the required libraries are available.
Now view the traces displayed in Spy's window or save the trace to a file. Figure 4 is an excerpt from the trace that shows the failed attempt to load the missing DLL. To troubleshoot Spy itself, enable detailed trace using the Options menu.
Spy is useful in solving common problems in application and driver debugging. However, because it relies on undocumented API and kernel data structures, it can be broken by modifications in future versions of Windows CE. Before that happens, you can use it to speed up debugging, troubleshooting, and exploring your system's behavior.