Robert is an independent consultant on the x86 architecture. He can be reached at email@example.com.
I launched a three-article discussion of Intel's System Management Mode (SMM) a few months ago with a comparison of the 80386's In-Circuit Emulator (ICE) support and System Management Mode (see DDJ, January 1997). Those columns provided more information about in-circuit emulation than I've seen in print in any other forum. Nevertheless, in-circuit emulation remains a mystery to most people. Engineers are spending scores of hours using arcane debugging techniques, simply because they don't know that a better, more powerful way exists to debug their toughest problems. The most common questions I am asked are: what is it? how much does it cost? what can it do? and what are the limitations? These are some of the questions I will attempt to answer in this column. I'll discuss the basics of in-circuit emulation, and the history of Intel's in-circuit emulator designs. In the coming months, I will discuss the microprocessor in more detail, as well as how it is designed with in-circuit emulation in mind. Finally, I will conclude the series by demonstrating some of the debugging techniques I've developed for use with in-circuit emulators.
My Own ICE History
My first experience with in-circuit emulation came as a junior-level BIOS engineer at Tandon Computers. I believed that I could solve any debugging problem using Microsoft Codeview. To me, Codeview represented the pinnacle of source-level debuggers. It let you set breakpoints and step through code execution in source or assembly-language mode. Breakpoints are set when Codeview surreptitiously inserts an INT-3 (breakpoint interrupt) into the code stream. The INT-3 insertion is hidden from the user when viewing in assembly mode, but the opcode can be detected when viewing raw memory. When the breakpoint occurs, Codeview regains control and the user can view and modify the CPU registers, memory contents, and other program variables. This technique works fine when your program is in volatile memory (RAM). However, it never works when the program resides in read-only memory (ROM) -- like the ROM BIOS. Obviously, Codeview cannot insert an INT-3 into the code stream of a ROM BIOS, because the ROM cannot be written. This is where the in-circuit emulator begins to help.
I was forced to start using an in-circuit emulator when I was given a problem that occurred in the ROM BIOS before the computer booted the operating system. Codeview and other debuggers can't run without first booting an operating system. Therefore, I knew that I couldn't use Codeview to help me debug this problem. My only alternative was to learn how to use the ICE. I cringed because I was scared to death of this monstrous ICE. In reality, I was afraid of learning something new. In fact, it didn't take long to learn how to use it; I actually liked it. The ICE was so powerful and versatile that I never looked back.
That was eight years ago. Today, I use an ICE for almost all my debugging. In many cases, a software-based debugger is still more productive than an ICE. However, over the years, I have honed my ICE debugging skills. Today, in those cases where a software debugger may have a productivity edge, my ICE skills often make up for the difference, giving me virtual parity in either environment.
In-Circuit Emulators versus Source-Level Debuggers
Modern in-circuit emulators share many attributes with software-based debuggers. Both support source-level debugging, software breakpoints, and memory watch points. You may reference or modify variables by name, or disassemble modules and subroutines by name and line number. Both allow you to view or modify raw memory. Both give the ability to view and modify source code at the assembly-language level.
Even though ICEs have many of the same capabilities as software debuggers, they aren't without their limitations. Some ICEs support source-level debugging by loading executable programs (.EXE files) directly into memory, albeit without source-level debugging information (like line numbers and symbol tables). Even though Microsoft Codeview is the de facto standard for software debuggers, Intel never supported the loading of an executable file that was compiled with Codeview debugging extensions. Instead, Intel's ICEs only supported source-level debugging by loading object modules (.OMF files) directly into memory. The Intel OMF files were a pseudoproprietary format -- clearly not an industry standard. If you wanted to convert your .EXE files (with Codeview information) into .OMF files (with Intel debug information), you needed an $800 software package called "Link & Locate 386 Extended Edition," from Systems & Software (http://www.ssi.com/). Thankfully, Intel has exited the ICE business, allowing other ICE vendors to depart from their (backward) lead. Today, ICE manufacturers are starting to support source-level debugging in standard Microsoft (.EXE) formats. As far as I'm concerned, this limitation (along with the hefty price tag) is the worst limitation of using in-circuit emulators for every day debugging problems.
In all other ways, an in-circuit emulator is much more powerful than any software-based debugger. Breakpoints may be set anywhere in a program, regardless of the memory type (RAM or ROM). Breakpoints can also be set on certain bus events that cannot be detected by any source-level debugger. Consider the following breakpoint events, and ponder using a standard debugger to do the same thing:
- Break upon any memory access (read, write, or either) at a specific address, with specific data.
- Break upon any memory access at a partial don't-care address, like 00124XXX with partial don't-care data such as XX2345XX.
- Break upon data access (memory or I/O, but not code fetches) to a specific address, or an address and data with don't-care fields.
- Break upon any byte, word, or double-word memory or I/O access at any address.
- Break upon any microprocessor special cycle, such as INT-Acknowledge, FLUSH, or a CACHE WRITEBACK.
- Break upon an interrupt acknowledge cycle for a specific interrupt, or any arbitrary interrupt.
- Break upon bus inactivity or ADS hold violations.
- Break upon code fetches with specific data patterns.
- Even break upon certain illegal bus cycle types.
The ICE has the ability to halt emulation on any of these events, and many more that don't readily come to mind. These capabilities are all accomplished with hardware assistance, which doesn't exist in software-based debuggers. This extra hardware assistance gives the ICE its powerful capabilities, but also gives it a hefty price tag. The traditional hardware-assisted ICE costs between $35,000 and $50,000. Other scaled-down versions, called In-circuit Test Probes (ITPs) or In-Circuit Debuggers (ICDs), lack the hardware assistance, but cost under $10,000. (I will discuss the differences between ICEs and ITPs in my next column.)
ICEs also support debugging symmetric multiprocessor (SMP) environments. The ICEs have the ability to chain together and signal each other when one processor receives a breakpoint event. This ability gives the ICE the clear advantage in debugging any SMP problems.
In truth, the term "emulator" is actually a misnomer. The ICE is not emulating anything -- it's a real CPU that is plugged directly into the CPU socket. The ICE just happens to have a hundred-wire umbilical cord attached to it. For the most part, the ICE runs at the full CPU speed. Therefore, those hard-to-find race conditions between software and hardware will still show up using an ICE.
ICE Evolution: Intel's Offerings
The first ICE I used was a dedicated in-circuit emulator computer from Intel. I called this "the big blue box" type of ICE. The entire ICE was housed in a self-contained computer, which included a monitor and keyboard. The ICE plugged into the CPU socket, and targeted a 6-MHz 80286 motherboard. Execution breakpoints were detected by snooping code prefetches on the microprocessor data bus. When a code prefetch matched your specification, the ICE would halt. Then, you could examine or modify memory, CPU registers, memory, and I/O ports.
Later, Intel introduced the I2ICE, a departure from the dedicated ICE machine approach. Users needed to provide a host computer, which typically needed to be an 80286-class machine with some amount of expanded memory. (Yes, that's right...LIM-style expanded memory.) If your computer didn't have expanded memory, Intel would sell you an above-board unit to serve this purpose. The host computer attached to the I2ICE hardware via a standard serial connection -- communicating at the lightning speed of 9600 baud. The I2ICE hardware was in a box approximately 30×24×10 inches in size (if memory serves me correctly). The most noticeable attribute about the I2ICE was its noise -- the hardware had a fan on it that was very loud, making it hard to work in close proximity to this unit.
The I2ICE hardware had a limited set of changeable CPU modules. I recall having an 8086/8088, and an 80286 CPU module for our I2ICE. I also seem to recall that an 80186/80188 module was also available. Emanating from one end of the I2ICE hardware was a lead-shielded cable (the CPU umbilical cord), while at the other end was a small hardware module that plugged into the CPU socket. The real CPU laid on top of this small module (in close electrical proximity to the motherboard). Adapters were available to switch between CPU form factors -- for example PLCC and PGA versions of the 80286.
With the next evolutionary change, the huge (and loud) I2ICE hardware was gone. In its place was a small, streamlined box that measured approximately 15×12×2 inches. This ICE was commonly called the "hot-plate" or "boiler-plate" ICE because of its appearance. The host computer plugged into this small box, which in turn plugged into a small module that housed the CPU. Like the I2ICE, this ICE targeted multiple microprocessor environments. Modules were available for the 80286, and the newly released 16-MHz 80386.
About this time, Intel dabbled in UNIX-based ICEs. At one point, it shipped an ICE that was UNIX-software based. Instead of requiring a host computer, a VT-100 terminal was needed. Once the hardware booted UNIX, the ICE host autostarted itself. I'm only aware of an 80386 version of this ICE, though I wouldn't be surprised if other microprocessor targets were available. My most remarkable ICE story (the discovery of five or six undocumented opcodes) involved using this orphan ICE environment. This UNIX-based ICE disassembled all of the undocumented opcodes on the 80386 microprocessor. This inadvertent mistake gave me all of Intel's internal names for these undocumented opcodes. After I reverse-engineered the functions of these opcodes, I published my findings at http://www.x86.org/secrets/opcodes. This publication caused some concern within Intel's legal department that I had stolen this information. I was more than happy to tell their lawyer that my discovery was the result of Intel's sloppiness.
As microprocessors got faster and faster, Intel was forced to keep redesigning its ICE hardware. The next evolutionary change occurred when Intel developed the "Cobra" ICE chassis. The Cobra chassis was small, measuring approximately 12×10×3 inches in size. It was quiet and designed to target multiple microprocessor environments. The Cobra chassis was a modular design and had boards that could plug in and out. When a new CPU was designed, you simply bought the two plug-in boards, and saved over $20,000 in hardware costs. The Cobra chassis supported the 80386 DX, 80386 SX, and initial offerings of the 80486 DX, 80486 SX, and 80487 SX. An optional high-resolution time-tag board could be purchased for an additional $1500. The time-tag board provided cycle-accurate time-stamp information to the ICE bus cycle traces. The option also required a math coprocessor (80387) be plugged into the host computer. With a whopping 38,000 baud communications link, the Cobra was also much faster at communicating with the host computer. This higher baud rate substantially helped transfer microprocessor traces from the Cobra chassis to the host computer.
The Cobra chassis also supported functions for multiprocessor environments. Each Cobra chassis had two "SYNC" outputs and two "SYNC" inputs. These SYNC signals provided the means to start, stop, and trigger multiple ICEs in a multiprocessor environment. These signals could also be used to send and receive signals to and from external devices. I often used these signals to trigger a logic analyzer or SCSI BUS analyzer when the CPU detected a breakpoint. Conversely, I used the logic analyzer and SCSI BUS analyzers to trigger breaks to the ICE using these signals.
Before Intel came out with the Cobra version of the 80486 ICE, they vowed that it couldn't be done. Intel's sales force told me that they couldn't design an ICE that ran at such a fast CPU speed (33 MHz). Intel promised us developers that the 80386 ICE was the last of the standard, hardware-assisted ICEs. Instead, Intel introduced the 80486 In-Circuit Debugger (ICD). The ICD didn't support any of the fancy hardware breakpoint features of a full-blown ICE. The ICD breakpoint ability was limited to breakpoint events that were supported by the microprocessor itself. This included any debug register breakpoint, trap flag breakpoint, INT-1, task switches into a task with the T-bit set, writing to the debug registers when DR7.GD=1, and the undocumented ICEBP instruction (see http://www.x86.org/secrets/opcodes/ICEBP.html for a description of the ICEBP instruction). Without the hardware assistance, the ICD didn't have an instruction trace. Even so, once a breakpoint occurred, the ICD could disassemble assembly-language instructions, read and write memory, read and write I/O ports, and view all of the microprocessor registers -- including the undocumented segment descriptor cache registers (see http://www.x86.org/articles/loadall/#SB1 for a description of the segment descriptor cache registers). Even without hardware-assisted breakpoints or an instruction trace, the ICD was a very powerful debugging tool. As the first in-circuit debugging tool that cost under $10,000, the ICD was also a very cost-effective debugging solution.
The final evolutionary change occurred in the same time frame that Intel abandoned designing and selling in-circuit emulators. Even though Intel abandoned the ICE market, they designed one last emulation device to coincide with this final evolutionary change in microprocessor design. Intel had abandoned their current method of supporting in-circuit emulation functions within the CPU (available from the 80286 to 80486), and adopted an entirely new approach (available in the Pentium and Pentium Pro processors). This new approach was called a "debug port," and required an entirely new emulator design. The new emulator design was called an "In-circuit Test Probe" (ITP). Like the ICD, the ITP didn't have any hardware assistance. Without hardware assistance, its debugging capabilities were identical to the ICD. The ITP was also a very cost-effective debugging tool, still costing less than $10,000. The ITP solution was both evolutionary and revolutionary. The ITP approach to in-circuit emulation is by far my favorite of Intel's designs.
I have barely touched the surface of discussing in-circuit emulation here. I've been using ICEs for eight years, and have a good understanding of how they work. I have watched Intel's evolutionary changes to ICEs. During that time, ICEs have gone from dedicated hardware machines to a host-target environment. I've seen monstrous ICE boxes and debuggers that are small enough to fit in your pocket. In my next column, I'll discuss the changes that took place in the microprocessors themselves. In-circuit emulation wasn't always provided by external hardware; considerable ICE support was designed into the CPU. I'll take a look at these changes and discuss their strengths and weaknesses. In another future column, I'll discuss the Pentium Probe Mode -- ICE support built right into the Pentium Processor.
Copyright © 1997, Dr. Dobb's Journal