Tools

Hardware-Assisted Breakpoints

By Dmitri Leman, June 01, 2005

Dmitri explains how to access debug registers on XScale-based CPUs from C/C++ applications.

Dmitri is a consultant in Silicon Valley specializing in embedded system integration, driver, and application development. He can be reached at [email protected].

When debugging applications and drivers on Pocket PC PDAs, I often miss being able to use hardware-assisted breakpoints. When such breakpoints are enabled, the CPU runs at normal speed, stopping only when data at a given address is accessed or modified (such data breakpoints are often called "watchpoints"). While these breakpoints are not for everyday use, they can dramatically speed up the debugging of corrupted data or the exploration of unfamiliar code. Unfortunately, the Microsoft eMbedded Visual C++ (EVC) debugger does not support hardware-assisted breakpoints (at least at this writing). Granted, EVC does provide a dialog for setting data breakpoints, but it appears to implement this feature by running the program step-by-step and checking the data—a process too slow to be useful. This is unfortunate because you can substitute other debugger features, such as regular code breakpoints, by inserting trace statements or message boxes in the program itself. Still, there's no substitute for hardware breakpoints.

I recently needed software-controlled data breakpoints when debugging a large and unfamiliar code base. I noticed that a local variable in certain functions sometimes changed its value erroneously. When I tried to step through the function in a debugger or insert a breakpoint within the function, the program timing was disrupted enough to hide the bug. Consequently, I decided to write a C++ class that would set the data breakpoint in its constructor, and remove it in the destructor with minimal overhead. Then I would only need to instantiate the class in the function and run the program. After I wrote the class and ran the program for a few minutes, the data breakpoint was triggered and the debugger displayed the exact line that modified the variable in question. While I originally implemented this class for a Pentium-based Windows NT using the SetThreadContext API, I recently implemented it on a PocketPC PDA based on the Intel XScale architecture.

In this article, I explain how to access debug registers on XScale-based CPUs from C/C++ applications. Using the code I present here (available electronically; see "Resource Center" page 3), you can easily set breakpoints on data reading and/or writing and catch exceptions generated by these breakpoints. Also, I show how to use another feature of XScale—the trace buffer—which lets you collect program execution history. I've tested the code on several off-the-shelf XScale-based PocketPC PDAs with the Windows Mobile 2003 and Windows Mobile 2003 Second Edition operating systems. To find out if your PDA is running an XScale, open the About Control Panel applet. XScale CPUs have names starting with PXA; for example "PXA270." The code I present here won't work with ARM-compatible CPUs from manufacturers that do not support XScale debug extensions.

Intel's XScale Architecture

Intel's XScale architecture is a successor to StrongARM, which was originally designed by Digital Equipment Corporation. At this writing, most models of Windows Mobile/PocketPC 2003 and 2002 PDAs run XScale-based CPUs, while some older PocketPC 2002 PDAs used StrongARM-based processors. All these processors are based on the ARM architecture designed by ARM Limited. StrongARM was based on ARMv4 (ARM Version 4) and XScale on ARMv5TE. Compared to StrongARM, XScale has several extensions, such as support for the Thumb 16-bit instruction set (in addition to the 32-bit ARM instruction set), DSP extensions and debug extensions. For user mode applications, XScale maintains compatibility with StrongARM. To learn more about ARM architecture, registers, instructions, and addressing modes, see The ARM Architecture Reference Manual, Second Edition, edited by David Seal (Addison-Wesley, 2000). For a quick reference to XScale-supported instructions, see XScale Microarchitecture Assembly Language Quick Reference Card (http://www.intel.com/design/iio/swsup/11139.htm). XScale-specific features, such as memory management, cache, configuration registers, performance monitoring, and debug extensions are documented in Intel's XScale Core Developer's Manual (http://www .intel.com/design/intelxscale/273473.htm).

Using XScale Debug Extensions

Normally, XScale-based CPUs run with debug functionality disabled, but they may be configured to execute in one of two debug modes—Halt and Monitor. Halt mode can only be used with an external debugger connected to an XScale CPU through JTAG interface. Since off-the-shelf PDAs are unlikely to have JTAG connectors, I focus here on Monitor debug mode, which can be used by software running on the CPU itself without any external hardware or software. Useful features in this mode include instruction breakpoints, data breakpoints, software breakpoints, and a trace buffer. Except for instruction breakpoints (which are generated by a special instruction inserted into the program), these features can be enabled and configured using debug registers.

Intel provides the XDB Browser, a powerful visual debugging tool (included with the Intel C++ compiler), which gives you full control of XScale CPU internals, including debug extensions. Unfortunately, this tool requires special debug code that's built into the Board Support Package (BSP), which was unavailable on most PDAs at the time of writing.

The debug registers in Table 1 belong to coprocessors 14 (CP14) and 15 (CP15). Coprocessors are modules inside the CPU, which extend the core ARM architecture. The coprocessor registers are accessed using special commands. In the code accompanying this article, I use the commands MRC and MCR with the syntax: MRC{cond} p<cpnum>, <op1>, Rd, CRn, CRm, <op2> to move from coprocessor to ARM register, and MCR{cond} p<cpnum>, <op1>, Rd, CRn, CRm, <op2> to move from the ARM register to coprocessor.

{cond} is an optional condition (in ARM, most instructions can be marked with a condition to specify whether the instruction should be executed or skipped depending on processor flags).
p<cpnum> is either the p14 or p15 coprocessor name.
Rd is a general-purpose ARM register.
CRn and CRm identify the coprocessor register.
<op1> and <op2> are opcodes (and are always 0 when working with debug registers).

For example:

MCR p15, 0, R0, c14, c0, 0 ; write R0 to DBR0
MRC p14, 0, R1, c10, c0, 0 ; read DBGCSR to R1

Software access to debug registers can be done from a privileged mode only; user-mode access generates exceptions. Fortunately, it appears that PocketPCs always run applications in Kernel mode. Windows Mobile-based Smartphones, on the other hand, run applications in user mode. Trusted applications (which are signed with a trusted certificate) can switch to Kernel mode using the SetKMode API. Because I don't have an XScale-based Smartphone, I focus here on the PocketPC.

I implemented the debug register access code in assembly language; see AccessCoproc.s (available electronically), which contains several short routines: SetDebugControlAndStatus, SetDataBreakPoint, SetCodeBreakPoint, ReadTraceBuffer, GetProgramStatusRegister, and ReadPID. The file Breakpoint.file contains declaration of these functions and related constants to let you call the functions from C or C++.

To enable debug functionality, bit 31 (Global Enable) should be set in Debug Control and Status Register (DCSR). Bits 0 and 1 in this register are used to enable trace buffer. See the SetDebugControlAndStatus implementation in Listing One. Applications should call DWORD dwOrigDCSR = SetDebugControlAndStatus(DEF_GlobalDebugEnabled, DEF_GlobalDebugEnabled) before setting any breakpoints, then save the result. Before exiting, applications should call SetDebugControlAndStatus(dwOrigDCSR, -1) to restore the DCSR to the original value; see the WinMain function in BreakPointSamples.cpp (available electronically).

There are two data breakpoint registers: DBR0 and DBR1. There is also Data Breakpoint Control Register (DBCON), which lets you configure hardware breakpoints on up to two separate addresses or a single breakpoint on a range of addresses. The breakpoints can be configured for load only, store only, or any (load or store) access type. To set a breakpoint on a range, DBR0 should be set to the address and DBR1 to a mask. The breakpoint is triggered if a program accesses data at the address that matches the value in DBR0 while ignoring bits, which are set in the mask. I implemented an assembly routine SetDataBreakPoint (in AccessCoproc.s), which assigns all three of these registers. Enum XScaleDataBreakpointFlags (in Breakpoint.h) defines configuration values for DBCON, which can be passed as the third argument to the function. For a convenient way to set breakpoints on local variables, use DataBreakPoint. The functions TestWriteBreakpoint, TestReadBreakpoint, and TestRangeBreakpoint in BreakPointSamples.cpp show an example. When a data breakpoint is hit, it generates a data abort exception.

The Instruction Breakpoint Address and Control Registers IBCR0 and IBCR1 can be used to set breakpoints on code execution at a specific address. Usually, debuggers insert a special instruction into the program to implement a code breakpoint. This lets you set an unlimited number of breakpoints. But this method does not work with code located in ROM or Flash. In these cases, the hardware-supported instruction breakpoints come in handy; however, there are only two of them. Unfortunately, instruction breakpoints appear to be useless because they generate a "prefetch abort" exception, which is not passed to the __try/__except handler or a debugger.

Register TBREG is for reading bytes from the trace buffer and CHKPT0 and CHKPT1 are for associating execution history in the trace buffer with instruction addresses. Several other debug registers are for communication with JTAG debugger and are not discussed here.

The Process ID (PID) register is not a debug register, but used when preparing addresses to be set in DBRx or IBCRx. Windows CE can run up to 32 processes, each occupying its own 32-MB address slot. The current process is also mapped to slot 0, which lets a DLL code section (located in ROM) access different data sections when DLL is loaded in several processes. ARM architecture provides PID as a direct and efficient support for such slot remapping. The value of the PID is equal to the address of the process slot. The CPU uses the high 7 bits (31:25) on the PID to replace the correspondent bits of virtual addresses when they are 0. The same operation has to be performed when preparing addresses for DBRx or IBCRx (see macro MAP_PTR in Breakpoint.h).

Reporting Data Breakpoints

I present here three straightforward ways to handle data abort exceptions generated when data breakpoints are triggered:

Using an application debugger.
Using a __try/__except construct.
Writing a simple kernel debugger stub.

Application debuggers (such as EVC) handle data abort exceptions in any thread of the program under debug. They break execution and display the source line or instruction that triggered the breakpoint, display registers, local variables, and call stack.

However, the application debugger often cannot be used because a connection is not available or it's too slow. Also, the debugger cannot handle exceptions in a system process, such as device.exe (hosts drivers) or gwes.exe (hosts user interface).

The second approach is to wrap code in __try{}__except(Expression){} exception-handling blocks. When exceptions happen within the try{} block, the system executes an Expression statement. I implemented the function ExceptionHandler in BreakPointSamples.cpp, which should be specified as the argument to __except. I call the _exception_info API to get useful information, such as exception code, address, and CPU registers. ExceptionHandler displays this information in message boxes (to simplify integration of this code into various applications). Unfortunately, a __try/__except construct can only handle exceptions coming from a thread, which executes code within the try{} block or functions called from within the try{} block. This is not a problem if you can insert __try/__except into source code for all suspect threads in your program.

When printing information about an exception, it's best to print the stack trace. Printing the stack trace on an ARM is more difficult than x86 because the EVC compiler can generate several different types of function prologs and does not have an option to produce consistent stack frames (see "ARM Calling Standard" in EVC help for details). Also, unlike the x86, which always pushes the return address to the stack when calling a function, ARM code moves return addresses to a register LR. Most functions usually start by storing the LR on the stack, but highly optimized code can keep it in any register. This means that on ARMs, it may not be possible to reliably reconstruct the stack trace without disassembling the code (which is beyond the scope of this article).

A Simple Kernel Debugger Stub

It is sometimes necessary to catch exceptions globally—in any thread of any process. The easiest way to achieve this is to register a DLL as a kernel debugger stub. I include here the minimal code (available electronically) capable of handling system-wide exceptions. In the days of Windows CE 3.0/Pocket PC 2002, you could register a regular user DLL as a kernel debugger stub and display exceptions in a regular message box (the whole code was just about 200 lines). Alas, in Windows CE 4.x/PocketPC 2003, the kernel debugger must be loaded as a kernel module. The problem is that a DLL such as this cannot link to any other DLL, even coredll (which provides most CE API and C/C++ runtime library functions). Consequently, I had to implement my own sprintf-like formatting routine as well as integer division-by-10 (both are normally imported from coredll). I also recycled my old HTrace library to write trace to a shared memory buffer, which can be displayed from a separate application. You can find the code in the SimpleKDStub directory. To run it, copy SimpleKDStub.dll and KDViewer.exe to the PDA and start the program. It loads the stub, which starts listening for exceptions. Once an exception is caught, it is printed to a shared buffer and displayed in the application. This tool is useful for data breakpoints and for catching other exceptions in any application on the PDA.

XScale Trace Buffer

The XScale architecture implements a powerful debugging feature—the trace buffer. When enabled, it collects a history of executed instructions. The trace buffer is just 256 bytes long (built inside the CPU itself), but stores the history as a compact sequence of 1- or 5-byte entries representing control flow changes (exceptions and branches). Each entry has a 1-byte message, which indicates the type of entry (exception, direct, or indirect branch) and the count of instructions executed since the previous control flow change. If this count exceeds 15, then a special roll-over message is stored. Entries for indirect branches include an additional 4-byte target address. The buffer may be configured to work in wraparound or fill-once mode. Wraparound is appropriate when waiting for an exception (as I do in this article). Fill-once mode (which generates a "trace-buffer full break" exception once the buffer is full) may be used to record all code execution continuously (however, I have not tried it yet).

The content of the trace buffer is extracted by reading the TBREG register 256 times (this also erases the buffer). The CHKPTx registers are used to get an address of a starting point for the reconstruction. Unfortunately, the buffer does not contain enough information to reconstruct the execution history without disassembling the executed code, counting instructions, and examining branches. Such a program is beyond the scope of this article. However, I included electronically the function ShowTraceBuffer, which simply displays the list of entries in the buffer in a series of message boxes. You can use this information, along with the disassembly window of the EVC debugger, to recover execution history prior to an exception. This may be a more powerful tool than a stack trace. Be aware that the trace buffer collects global execution information from all processes, the OS kernel, and interrupt handlers.

The function TestTraceBuffer in BreakPointSamples.cpp demonstrates using the trace buffer to record execution history. TestTraceBuffer sets a data breakpoint and enables the trace buffer, then it calls function Test, which calls Test1, which triggers the breakpoint. Figure 1 is an annotated disassembly listing for these functions. The exception raised by the breakpoint is displayed in a message box in Figure 2, where you can see the address of the instruction that triggered the breakpoint (value of register PC=1221C). Register R0= 2C02FDFC is the address of the data and register R1= B(123)-the new value. Figure 3 displays the parsed trace buffer: +1, IBr121BC, +1,BR,+4,BR. This lets you reconstruct the execution history: The function SetDebugControlAndStatus executed one instruction after enabling the trace buffer, then returned to address 121BC (in TestTraceBuffer), then one instruction was executed, then branch (to Test), then four instructions and branch (to Test1).

Further Improvements

A simple way to enhance the code I present here would be to print the module name and offset instead of the raw return address when printing exception information. A more difficult exercise would be to print the stack trace or enhance the trace buffer printing with a disassembler to fully reconstruct the execution history. A completely new direction would be to implement a continuous execution recording tool using a fill-once trace buffer. I may post bug fixes and improvements on my web site (http://forwardlab.com/).

Conclusion

XScale-based CPUs provide a powerful support for hardware-assisted debugging. Fortunately, it is not necessary to wait for application debuggers to provide access to all CPU features from the GUI. On Pentium-based systems, Visual Studio never managed to implement breakpoints on data reading or hardware breakpoints on local variables. Therefore, it is important for you to know the capabilities of the CPU and how to exploit them from an application. The tricks I present here may not be for everyday use, but every now and then, they can save hours (or days) of difficult debugging.

DDJ

Listing One

; SetDebugControlAndStatus writes (optionally) to 
; Debug Control and Status Register (DCSR)
; and returns the original value of DCSR.
; parameters: 
;   r0: flags to be set or reset in DCSR.
;   r1: mask - flags to be modified in DCSR, the rest is preserved.
; return value: 
;   value of DCSR before the modification

    EXPORT    |SetDebugControlAndStatus|
|SetDebugControlAndStatus| PROC
    stmdb   sp!, {r2,lr}   ; save registers
    mrc     p14, 0, r2, c10, c0, 0 ; read DCSR to r2
    and     r0, r0, r1     ; r0 = r0 & r1 - clear flags not in mask
    bic     r1, r2, r1     ; r1 = r2 & ~r1 - leave flags not in mask
    orr     r0, r0, r1     ; r0 = r0 | r1 - combine flags
    cmp     r0, r2         ; compare new with original
    mcrne   p14, 0, r0, c10, c0, 0 ; write DCSR if flags have changed
    mov     r0, r2         ; prepare to return the original flags
    ldmia   sp!, {r2,pc}   ; restore the registers and return

Back to article

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Tools