Tools

Simulation Compilation and Portability

By Marc E. Hoffman, March 01, 1995

Simulation compilation is a technique that lets you compile a simulation, then run an executable representing the original code instead of simulating the code directly.

MAR95: Simulation Compilation and Portability

Translating code from one machine to another for rapid development

Marc is a senior software engineer for Analog Devices. He can be contacted at [email protected].

Simulation compilation is a technique that lets you compile a simulation, then run an executable that represents the original code instead of simulating the code directly. By translating the code from the instructions of one machine to the instructions of another and compiling all the semantics of the machine's instruction into a set of instructions that execute equivalently on another processor, you can speed up simulation execution by up to three orders of magnitude.

The major difference between a simulation compiler and other simulators is that other simulators use interpretation at run time. A simulation compiler compiles the set of instructions needed for the simulation into another machine's machine code.

In this article, I'll examine how machine code is converted from one target to another. To illustrate, I'll create a simple machine and build a set of tools that create running simulations of it. I'll then discuss how you can extend the environment to provide debugging tools.

Building a Simulation Compiler

In April 1994, I had to create a set of development tools for a next-generation digital-signal processor (DSP). Although I had already written the assembler, linker, and disassembler, I needed to define the run-time model before getting the compiler to work. Tired of waiting for a simulator, I asked a colleague: "What if we converted the object modules from binary code into low-level C code?"

Ultimately, I wrote a minimal simulation compiler similar to Listing One . Although crude, it did most of the job. It was extremely fast--approximately 300 times faster than our previous simulators. Additionally, I was able to prototype my run-time environment, and I got a debugger for free.

A simulation compiler takes an executable image for one machine and translates it such that it can be executed on another machine. In a sense, you are compiling or translating the instructions of one machine into a set of instructions for another, while maintaining the exact semantics of the original machine. You achieve this by reading the contents of the executable image and mapping each instruction of the original machine (the "source") to a set of instructions that emulate the same functionality on the target machine.

Instead of writing machine code for a particular target machine, you can write code for an abstract machine--say, "machine C"--suitable for running on any workstation or PC. By targeting C, you can run the executable source on any machine that has a C compiler. Figure 1 defines the abstract C machine.

The machine model creates an infinite loop that iterates over the machine state of the source machine's program counter (PC). BEFORE_CYCLE is a function or macro that causes the PC to get the next valid PC and set up and check any pre-existing conditions of the machine state that need to be serviced. AFTER_CYCLE finishes any work left after the machine executes an instruction; in some cases, it could be defined to "do nothing." INVALID_PC_HANDLER is invoked whenever the PC is out of range: It could issue an error or perhaps emulate a trap. Each case of the switch statement is used to implement the TEXT address space of the source machine. (TEXT is the code space in the UNIX world.) The switch makes it indexable, because the machine I'm interested in (like most machines) has indirect jumping. This means you must be able to directly address the entire TEXT address space at run time.

BEFORE_CYCLE is responsible for generating the next PC. The PC is modified by a primitive called TICK, which BEFORE_CYCLE invokes to step the PC to the next address. You can't just add 1 to the PC--you need to be able to handle branches. To implement this logic, you need something similar to a multiplayer used in hardware, which multiplexes the next address that the PC gets. This next address could be a branch or an interrupt-vector address, or the machine might have variable-size instructions. To multiplex the next address, you'll use a variable in C, called nextaddr. To implement a jump instruction, set nextaddr to the target of the jump; to get the destination address, just extract the correct bits from the source executable.

To build a simulation compiler, you need a loader--software that knows how to read the contents of an executable. Because C code is generated as the target machine, you simply disassemble (decode) the executable. In other words, you use a loader to read the machine code and a disassembler to transform that code into C. To build the simulation compiler, I integrate the two programs: The loader reads each instruction and calls on the disassembler to produce the C code that represents the semantics of the machine instruction. This is a fairly straightforward algorithm; the difficult part is generating the correct C code. To illustrate how this is done, I'll build a simple little machine (SLM), then define a disassembler and a loader for the SLM. Keep in mind that my motivation for creating the SLM is instructive; it keeps the details of a big processor out of this article.

Figure 2 shows a simple machine with an 8-bit address space for data and a separate address space for the program, which is characteristic of a Harvard architecture (the more common von Neumann architecture of a conventional processor uses a single address space for both data and program). To keep my example clean, I've used this dual address space and kept all the instructions the same size. Example 1(a) is a simple program using SLM.

First the C machine needs to define the registers as global variables so that you can look at their contents with any ordinary source-level debugger; see Example 1(b). Since the machine is an 8-bit machine, you can declare a suitable vector to represent memory; see Example 1(c). (I won't worry about size problems here, but you wouldn't want to do this for machines with 4-gigabyte address spaces!) Now look at each instruction and write a chunk of C code that behaves the same way. For the ALU operations, you will need a scratch register (see Example 2) that has a wider mode than the size of the word you're operating on; for this machine it is any int. If you don't have a larger data type, you'll have to do a little more work to get the status bits set correctly.

The trick to building a disassembler that generates the appropriate C code for the SLM is to reuse the disassembler. The disassembler works much like printf: It takes a string and prints each character in turn until it sees a percent sign (%), where its subsequent action will be determined by the next character. Figure 3 defines the disassembler for the SLM.

Next, you need to define the function shown in Figure 4, disassm, which takes as its argument the instruction right-justified as an int. (This wouldn't work if the machine instruction word were variable or bigger than the largest integral data type for the machine you are running on. This problem is easy to solve, but is beyond the scope of this article.)

Now it's time to build a simulation compiler that generates C code for the disassembler. First, simplify the C code a little by macroizing (creating a preprocessor macro) or functionalizing the MOVES. This minimizes the number of tools needed to rebuild when you change the simulator's semantics; that is, you can just rebuild the simulation and not the compiler. Also, macroization allows the writes to be hooked (which lets the macro do its work and then some). Figure 5 shows a simulation, including a header file that describes the machine; Figure 6 applies the simulation compiler to the program. The simulation compiler compiles this code into a simulation and examines each instruction in turn, calling the disassembler on the instruction code representing each word. Here I've used an array of shorts to represent the object-code file, making implementation simple. In a more realistic example, you would need to read the information from the file.

A Commercial Implementation

At Analog Devices, we have developed a simulation-compiler tool that enables any processor to simulate our DSP instruction set. The scheme provides a flexible, rapid-prototyping environment intended to decrease a developer's time to market. Typical hardware simulators are slow, running at about 1000 instructions per second, and have limited debugging capabilities.

The simulation environment we are exploring is two to three orders of magnitude faster than conventional simulators. Consider, for example, a typical speech-compression algorithm requiring computing power of approximately 12 million instructions/sec. A fully interpretive simulator running at 1000 cycles/sec takes roughly 3.3 hours to process one second of speech, while the compiled simulation with the speed of 240,000 cycles/sec takes only 50 seconds for the same task (numbers based on a 486DX66 CPU).

The design of the DSP simulation compiler allows developers to write not only DSP code, but also develop simulations of the hardware peripherals that make up an entire system. Since the simulation is compiled into C code, you can mix host C code into the simulation, resulting in a truly customizable development environment. Furthermore, you can find bugs that occur after millions of instructions have executed.

The debugging environment is constructed around the host C compiler's debugging environment so the user needn't learn how to use a new debugger. The simulation compiler generates special bookkeeping information that allows symbolic debuggers to trace through the preconverted source code. Hence, your favorite source-level debugger can be used to debug DSP applications.

Conclusion

The simulation-generation techniques presented here don't address the problem of self-modifying code. In fact, you can't handle it without the aid of an interpreter and a scheme that flags modified instructions. However, once the simulation compiler tables are created, you can generate an interpreter directly from the tables. This technique of compiling simulations makes the build or load time of a simulation quite expensive. For debugging environments, the conversion is done only a few times, and the compiled simulation is used more than once before it is regenerated. Thus, the speed and extensibility of the simulation environment outweigh the technique's drawbacks.

Figure 1: Defining the abstract C machine.

while (1) {
  BEFORE_CYCLE ();
  switch (PC) {
  case ADDR: instruction; break ;
  default:
    INVALID_PC_HANDLER ();
  }
  AFTER_CYCLE ();
}

Figure 2: Definition of SLM.

SLM:
registers: rX,      rY,  rZ,  rI,  rJ,  rK,  PC,  ss{z,n,v,c}
          0,   1,   2,   3,   4,   5,   6,   7
addr: is 8bit
000       nop
1sd       mov rsrc, rdst
2id       mov (ri), rdst
3si       mov rsrc, (ri)
4kk       mov k, rx
5kk       mov k, ry
6xx       add src, dst
7xx       sub src, dst
8xx       mul src, dst
9xx       jmp addr
Axx       jz  addr
Bxx       jc  addr
Cxx       halt

Example 1: (a) Simple SLM program; (b) defining registers as global variables; (c) declaring a vector of memory.

(a)
     mov 5, rx
     mov 3, ry
     mul rx, ry
     mov 15, rx
     sub rx, ry
     jmp ok
     halt
ok   add rx, ry
     halt
(b)
     unsigned char rx,ry,ri,rj,rk,
     pc, ss;
(c)
      unsigned char mem[1<<8];

Example 2: A scratch register.

  unsigned int scratch;
000  -- nop
     /* nothing */
1sd  -- mov rsrc, rdst
     rdst = rsrc;
     ss.z = (rdst == 0) ? 1:0;
     ss.n = ((signed)rdst < 0) ? 1:0;
2id  -- mov (ri), rdst
     rdst = mem[ri];
     ss.z = (rdst == 0) ? 1:0;
     ss.n = ((signed)rdst < 0) ? 1:0;
3si  -- mov rsrc, (ri)
     mem[ri] = rsrc;
4kk  -- mov k, rx
     rx = k;
     ss.z = (k == 0) ? 1:0;
5kk  -- mov k, ry
     ry = k;
     ss.z = (k == 0) ? 1:0;
6xx  -- add src, dst
     rdst = scratch = rdst + rsrc;
     ss.v = (scratch > 0x1ff) ? 1 : 0;
     ss.c = (scratch & 0x1ff) ? 1 : 0;
     ss.n = (scratch & 0x80) ? 1 : 0;
7xx  -- sub src, dst
     rdst = scratch = rdst - rsrc;
     ss.v = (scratch > 0x1ff) ? 1 : 0;
     ss.c = (scratch & 0x1ff) ? 1 : 0;
     ss.n = (scratch & 0x80) ? 1 : 0;
8xx  -- mul src, dst
     rdst = scratch = rdst * rsrc;
     ss.v = (scratch > 0x1ff) ? 1 : 0;
     ss.c = (scratch & 0x1ff) ? 1 : 0;
     ss.n = (scratch & 0x80) ? 1 : 0;
9xx  -- jmp addr
          nextaddr = addr;
Axx  -- jz  addr
     if (ss.z)
               nextaddr = addr;
Bxx  -- jc  addr
     if (ss.c)
               nextaddr = addr;
Cxx  -- halt
     return;

Figure 3: The SLM disassembler.

char *regs[] = { "rx","ry","ri",
       "rj","rk", "pc", "ss" };
char *distemp[] = {
        "nop",
        "mov %1, %2",
        "mov (%1), %2",
        "mov %1, (%2)",
        "mov %k, rx",
        "mov %k, ry",
        "add %1, %2",
        "sub %1, %2",
        "mul %1, %2",
        "jmp %a",
        "jz %a",
        "jc %a",
        "halt",
        0,
        0,
        0,
};

Figure 4: Defining the disassm function.

disasm (int inst)
{
   char *template = distemp[(inst>>8)&0xF];
   char *p = template;
   if (template) {
      while (*p) {
        if (*p == '%') decode_operand (inst, ++p);
        else putchar (*p);
        p++;
      }
   }
}
decode_operand (int inst, char *p)
{
   switch (*p++) {
   case '1':  printf ("%s", regs[(inst>>4)&0xF]);      break;
   case '2':  printf ("%s", regs[inst&0xF]);           break;
   case 'k':  printf ("%d", (singed char)(inst&0xFF));  break;
   case 'a':  printf ("%u", inst&0xFF);                break;
   default:
     abort ();
   }

}

Figure 5: Typical simulation, including a header file that describes the machine.

#define SETR(x,y)\
   x = y;\
   ss.z = (x == 0) ? 1:0; ss.n = ((signed)x < 0) ? 1:0
#define SETMEM(x,y) x = y;
#define ALUOP(op, rdst,rsrc)\
   rdst = scratch = rdst op rsrc;\
   ss.v = (scratch > 0x1ff) ? 1 : 0;\
   ss.c = (scratch & 0x1ff) ? 1 : 0;\
   ss.n = (scratch & 0x80)  ? 1 : 0
char *distemp[] = {
        "/* nothing */",
        "SETR (%2, %1);",
        "SETR (%1, mem [%2]);",
        "SETMEM (mem [%2], %1);",
        "SETR (rx, %k);",
        "SETR (ry, %k);",
        "SETR (%2, ALUOP (+, %2, %1));",
        "SETR (%2, ALUOP (-, %2, %1));",
        "SETR (%2, ALUOP (*, %2, %1));",
        "nextaddr = %a;",
        "if (ss.z) nextaddr = %a;",
        "if (ss.c) nextaddr = %a;",
        "return",
        0,
        0,
        0,
};

Figure 6: Applying the simulation compiler to a program.

0000 405       mov 5, rx
0001 503       mov 3, ry
0002 801       mul rx, ry
0003 40F       mov 15, rx
0004 701       sub rx, ry
0005 A07       jz ok
0006 C00       halt
0007 601  ok   add rx, ry
0008 C00       halt

Listing One


::::::::::::::
simcc.c
::::::::::::::
/* Simulation compiler for SLM */

char *regs[] = { "rx","ry","ri","rj","rk", "pc", "ss" };
char *distemp[] = {
        "nop",
        "mov %1, %2",
        "mov (%1), %2",
        "mov %1, (%2)",
        "mov %k, rx",
        "mov %k, ry",
        "add %1, %2",
        "sub %1, %2",
        "mul %1, %2",
        "jmp %a",
        "jz %a",
        "jc %a",
        "halt",
        0,
        0,
        0, 
};
char *simtemp[] = {
        "/* nothing */",
        "SETR (%2, %1);",
        "SETR (%1, mem [%2]);",
        "SETMEM (mem [%2], %1);",
        "SETR (rx, %k);",
        "SETR (ry, %k);",
        "SETR (%2, ALUOP (+, %2, %1));",
        "SETR (%2, ALUOP (-, %2, %1));",

        "SETR (%2, ALUOP (*, %2, %1));",
        "nextaddr = %a;",
        "if (ss.z) nextaddr = %a;",
        "if (ss.c) nextaddr = %a;",
        "return;",
        0,
        0,
        0, 
};
disasm (int inst, char **templates)  
{
   char *template = templates[(inst>>8)&0xF];
   char *p = template;
   if (template) {
      while (*p) {
        if (*p == '%') decode_operand (inst, ++p);
        else putchar (*p);
        p++;
      }
   }
}
decode_operand (int inst, char *p)
{
   switch (*p++) {
   case '1':  printf ("%s", regs[(inst>>4)&0xF]);      break;
   case '2':  printf ("%s", regs[inst&0xF]);           break;
   case 'k':  printf ("%d", (signed char)(inst&0xFF)); break;
   case 'a':  printf ("%u", inst&0xFF);                break;
   default:
     abort ();
   }
}
genc (int addr, int inst)
{
  printf ("      case 0x%04x: T(\"", addr); disasm (inst, distemp); 
    printf ("\");\n");
  printf ("        "); disasm (inst, simtemp);  printf ("\n");
  printf ("        break;\n");
}
int prog[] = {9, 0x405,0x503,0x801,0x40F,0x701,0xA07,0xC00,0x601,0xC00};
main ()
{

  int i;
  printf ("#include \"slm.h\"\n"
       "main ()\n"
       "{\n"
       "  while (1) {\n"
       "    BEFORE_CYCLE ();\n"
       "    switch (pc) {\n");
  for (i=0; i<prog[0];i++)
    genc (i, prog[i+1]);

  printf ("    default:\n"
       "      INVALID_PC_HANDLER ();\n"
       "      break;\n"
       "    }\n"
       "    AFTER_CYCLE ();\n"
       "  }\n"
       "}\n");
}
::::::::::::::
slm.h
::::::::::::::
/* SLM definitions.  simulator header file. */
  unsigned char rx,ry,ri,rj,rk, pc;
  struct { unsigned z:1, n:1, v:1, c:1; } ss;
  unsigned char mem[1<<8];
  unsigned int scratch;
  unsigned char nextaddr;

#define SETR(x,y)   x = y; ss.z = (x == 0) ? 1:0; ss.n = ((signed)x < 0) ? 1:0;
#define SETMEM(x,y) x = y;
#define ALUOP(op, rdst,rsrc)\
   rdst = scratch = rdst op rsrc;\
   ss.v = (scratch > 0x1ff) ? 1 : 0;\
   ss.c = (scratch & 0x1ff) ? 1 : 0;\
   ss.n = (scratch & 0x80)  ? 1 : 0;
#define BEFORE_CYCLE() pc = nextaddr; nextaddr++
#define AFTER_CYCLE()
#define INVALID_PC_HANDLER() abort ()
#define T(x) printf ("%04x: %s\n", pc, x)

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Tools

Simulation Compilation and Portability

Translating code from one machine to another for rapid development

Building a Simulation Compiler

A Commercial Implementation

Conclusion

Figure 1: Defining the abstract C machine.

Figure 2: Definition of SLM.

Example 1: (a) Simple SLM program; (b) defining registers as global variables; (c) declaring a vector of memory.

Example 2: A scratch register.

Figure 3: The SLM disassembler.

Figure 4: Defining the disassm function.

Figure 5: Typical simulation, including a header file that describes the machine.

Figure 6: Applying the simulation compiler to a program.

Listing One

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Tools Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Tools

Simulation Compilation and Portability

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Tools Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content