Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Embedded Systems

Moving From Assembly to C


AUG92: MOVING FROM ASSEMBLY TO C

MOVING FROM ASSEMBLY TO C

A technical-support veteran discusses using C for embedded systems

Beth Mazur

Beth provides support for the Whitesmiths 8-bit C cross compilers at Intermetrics Microsystems Software. She can be reached at 733 Concord Ave., Cambridge, MA 02138.


Most embedded systems programmers learned their craft using assembly language programming. Why? Because for a long time, assemblers were the only tools around for writing code for target systems. Now, however, high-level language tools specifically designed for embedded systems development are available. Consequently, C is increasingly becoming the language of choice for many embedded systems projects.

What's C's attraction? For starters, C is more maintainable and portable--and just as efficient--as assembly. Still, many assembly programmers shy away from moving to C, citing its steep learning curve or the lack of language facilities.

This article has its roots in my two years on the front lines of Intermetric's technical support for Whitesmith's C cross-compilers; the issues I discuss here stem from the most frequently asked questions. For the most part, I'll use Motorola's M68HC11 to illustrate examples, although much of the information applies to other 8-, 16-, and 32-bit processors as well.

Using C

C cross-compilers let you write code on a host PC, UNIX workstation, or other computer and compile for use with any number of 8-bit target processors. Since not all of these processors offer the same support for the C language, you need to first understand how C handles the stack, recursion and reentrancy, the heap, C keywords of special interest to embedded systems programmers, the runtime startup, automatic data initialization, locating, function prototypes, and library support. You then need to decide how important these are to your application.

For example, most C compilers use a stack. As in the assembly world, the stack is used for temporary storage. In C, the stack is used to pass parameters and as storage for local variables (called "autovariables" in C). Because of the dependency on a stack, you must consider your processor when planning your application. If you plan on using on-chip RAM only (256 bytes on the M68HC11A8), you'll need to minimize your use of local variables, parameter passing, and nested subroutine calls. However, if you'll be using external RAM, you can allocate a section of memory there for stack usage.

Alternatively, many C compilers can generate code that uses static memory instead of a stack. This option minimizes the RAM requirements of an application, but generally makes functions compiled this way non-recursive and non-reentrant. If your application does not require recursion or reentrancy, you can take advantage of compile-time models that use static memory for parameter passing and local variables. Otherwise, you'll have to consider stack usage when deciding how much RAM your application will need.

A program is recursive if the compiled program can call itself during run time and execute correctly. The factorial program in Example 1 shows a recursive routine. If the local variable temp is stored on the stack, the program will execute correctly. Each time the program calls itself, storage for temp will be created in a new area of the stack. However, if temp is stored in a static area, say memory location 0x2012, each successive call to the program may overwrite the previous value of temp (depending on the evaluation order of the compiler). In this case, the factorial of any number would be 1.

Example 1: Factorial program that illustrates recursion.

  int factorial (int fact)
  {
  int temp;
  temp=fact;
  if (!fact)
    temp=1;                       /* fact = 0?  */
  else
    temp*=factorial (temp-1);     /* fact > 0   */
  return (temp);
  }

A program is reentrant if it can be entered by more than one task simultaneously without loss of information. The factorial program in Example 1 is reentrant if each task that calls factorial() has its own individual stack area. Reentrancy is important in the embedded systems world if you plan on calling functions from both your main application code and your interrupt service routines, or if you will be using an operating system and running concurrent tasks.

The heap is an area of RAM used by the C runtime library routines malloc() and calloc() to request memory dynamically. This is useful when you do not know the size of an array or structure at compile time and would like to avoid preallocating a large area of RAM that may not be used. The heap is an area of memory, similar to the stack, that must be set up before your program begins executing. Typically the heap starts a certain number of bytes away from the stack pointer; during program execution, the stack and heap grow toward each other, as shown in Figure 1. In this scenario, it's common for the stack and heap to collide, which will generally cause unexpected results! If you plan on using memory-allocation routines, you'll need to carefully consider the stack and heap when planning the RAM requirements of your application.

One of the benefits of using C is that you will now have access to the many data types in the language. These include integer types: char, short int, int, and long int; and floating-point types: float and double. Combining these data types with C's array, structure, and pointer facilities lets you define complex data structures and manipulate them easily.

For the most part, you may define and access your program variables without concern for how they are stored, as the compiler will treat each type consistently. If you want to manipulate your data from outside the domain of the C compiler, however, you'll need to know how the compiler represents the different data types. For example, is an int a 16- or 32-bit quantity? (Some compilers allow you to specify this at compile time.) Are floating-point numbers stored in IEEE Standard form or in a form used only by the compiler vendor? You'll need to know the answer to these questions if, for example, you plan on processing data from an A/D converter which may use a format incompatible with your program. If this is the case, you'll need to write an assembly-language routine that will translate the data into the format used by the compiler.

There are four keywords in C that are of special importance to the embedded system programmer--const, extern, volatile, and register. These keywords modify your variable declarations and notify the compiler that the particular variable has a special property.

The const keyword declares that a variable cannot be modified during program execution. As the name implies, this is most often used in embedded systems programming to declare that the variable is a constant that will not be changed and can therefore be stored in ROM. Declaring strings as const may be helpful if you are trying to minimize RAM usage. A compiler will generate an error message if your program tries to modify a variable that has been declared const; see Example 2(a). The extern keyword identifies variables defined in another module. In Example 2(a) I've defined pi in one module. If pi is referenced in another module, tell the compiler that pi has been defined outside this module and that it is a const type. Otherwise the compiler is free to assume that pi is a regular float variable that can be modified. This is done by specifying both extern and const in the second module, as in Example 2(b).

Example 2: (a) (An error message will result if your program tries to modify a variable that has been declared const; (b) using extern and const to tell the compiler that pi has been defined outside this module.

  (a)

  const double pi = 3.14159;
  main()
  {
     pi = 1.234; / * will generate error at compile-time  */
  }

  (b)

  extern const double pi;
  double circum(double radius)
  {
  double circ;
     circ = 2.0   * pi  * radius;
  }

As a general rule, external declarations should be the same as the original definition, except that you need to include the keyword extern and omit any initialization.

The volatile keyword is used to declare that references to a variable should not be optimized away. This is important in embedded systems when a variable may be memory-mapped I/O or storage shared between independent tasks. Example 3 shows the result of declaring porta as volatile. Here, the assembly code generated for C line 7 has been optimized out. Because porta was declared volatile, the code for the second assignment to porta was not deleted.

Example 3: The result of declaring porta as volatile.

  ;    1  volatile char porta;
  ;    2  char portb;
  ;    3  test()
  ;    4  {
  _test:
  ;    5  porta=1;
      ldab    #1
      stab    _porta
  LL4:
  ;    6  porta=2;
      ldab    #2
      stab    _porta
  LL6:
  ;    7  portb=1; / * optimized out as redundant */
  ;    8  portb=2;
      ldab    #2
      stab    _portb
  ;   9  }
   rts
  ;   10

The register keyword is used to specify that a variable should be stored in a register, as it will be used extensively in the program. Placing a heavily used variable in a register can reduce code size and increase execution speed. However, many compilers will ignore the register keyword if the target processor has a limited number of registers.

Even if you write your entire application in C, one minimal piece of assembly code will generally be required. This is the runtime startup program, and its purpose is to set up the environment for your C program. At a minimum, the startup will load the stack pointer. Your runtime startup may also include code to manipulate target registers such as the M68HC11 EEPROM block-protect register, BPROT, that must be zeroed within the first 64 cycles after reset to enable EEPROM writes.

Data manipulation may also occur in the startup program. Many startup programs will zero out any uninitialized variables you have defined, since the C language specifies that these variables have the value 0 when the program begins executing. The startup routine may also be the place to set up initialized global variables. This is called automatic data initialization.

Most C compilers make a distinction between initialized and uninitialized global variables; see Example 4. In the embedded systems world, this is because variables that will be modified must reside in RAM, yet RAM contents are usually erased or corrupted when power is removed. For uninitialized global variables, this is not a problem, as you do not expect these variables to have any meaningful value at startup time (other than 0). However, this is not true of initialized variables. These must have the value that you specified in your program.

Example 4: Most C compilers make a distinction between initialized and uninitialized global variables.

  /* Initialized variables are declared with a value.
     The variable will have this value when the program begins executing.
     Uninitialized variables are not declared with a value.  */
  int days_of_year = 365; /* this is an initialized variable */
  int number_of_days;     /* this is uninitialized */

One solution is to store the values of these variables in ROM and then initialize the RAM locations during the runtime startup process. In order to minimize the ROM storage requirements, at compile time initialized global variables are typically generated into one data section, and uninitialized global variables are generated into a different data section (often called the "bss" section--a carryover term from the original C compiler development on the PDP-11). Only variables in the initialized data section need to have their values copied into ROM. Figure 2 illustrates a typical M68HC11 memory map before and after the runtime startup has copied data from ROM to RAM.

One key feature of ANSI C compilers is their support for function prototypes. Function prototypes are function declarations that provide an explicit list of a function's parameters and their types. The benefit of providing a function prototype is that it enables the compiler to check that the arguments passed to a function match the prototype, preventing parameter mismatch problems between calling and called functions; see Example 5.

Example 5: Function prototypes enable the compiler to check that the arguments passed to a function match the prototype.

  long a,b;
  int i,j,k;
  int subroutine(int first, long second);
  main()
  {
  i=subroutine(j,a);  /* no action will be taken */
  i=subroutine(j,k);  /* compiler will promote k to long */
  i=subroutine(a,b);  /* will generate error, since long
                         cannot be passed for first param */
  }

Function prototype checking means that the compiler can make its best effort to comply with your prototype. If it can widen a parameter to match the prototype, it will do that. However, the compiler cannot truncate a parameter without loss of information, so it will flag those calls as errors.

Most C compilers offer an extensive C runtime library that offers a variety of routines that you can call from your program. These include I/O functions like printf() and scanf(), character functions like isalpha(), string functions like strcpy(), and math functions like sqrt(). Most compilers load only those library routines referenced by your program, so you don't have to worry about entire libraries being copied into your application. Some compilers also include the source to the C library so that you can modify routines if necessary.

If you will be using the I/O functions, you should check to see whether you will need to modify any routines. You may need to modify getchar() and putchar() for your target hardware; the other I/O routines are usually coded to call putchar() and getchar() and don't need to be changed. Example 6 shows C versions of putchar and getchar for the M68HC11.

Example 6: C versions of putchar and getchar for the M68HC11.

  #define RDRF 0x20          /* Receive Data Register Full */
  #define TRDE 0x80          /* Transmit Data Ready Empty */
  #define SCSR  *(char *) 0x102e    /* SCI Status Register */
  #define SCDR  *(char *) 0x102f    /* SCI Data Register */
  int putchar (int c)
  {
  if (c == '\n')
      putchar('\r');       /* put newline and carriage return */
  while (! (SCSR & TRDE))  /* wait until ready to receive */
      ;
  SCDR = c;                /* output character */
  return (c);
  }
  int getchar()
  {
  int c;
  while (! (SCSR & RDRF))  /* wait for character */
       ;
  c = SCDR;                /* receive character  */
  if (c == '\r')
      c = '\n';            /* translate to newline */
  return (c);
  }

Once your program has been compiled, you will need to link and locate it. In assembly programs, you may have used an assembler directive like ORG to locate your code and data sections in one program file. In C, compiled code is usually relocatable--this means that the location of instructions and variables is not known at compile-time. Locating code and data is the job of the linker.

Compilers make this job easy by allowing you to locate code either by module or by section. This capability means that you can change the location of your program simply by relinking--your code does not need to be recompiled. However, this does mean that some code will need to be in a separate module. For example, in the M68HC11, the interrupt vector table is located at 0xFFD6. Generating the addresses for this table usually means compiling or assembling a separate module that includes only the addresses for the interrupt service routines.

Combining C and Assembly

Often, the size or speed of a particular routine is critical, and you want to use assembly language because it offers maximum control. Most C compilers support a variety of features that let you combine the best of C and assembly. In addition to becoming familiar with these features, you'll need to understand the compiler's interface conventions so that you can call assembly programs from C and vice versa.

There are three main interface conventions to be aware of when combining both C and assembly-language routines: naming global symbols, passing parameters, and returning values. (The following examples may or may not be valid for the compiler you use.)

Many C compilers automatically prepend an underscore character to all global symbols defined in C. This is done to avoid unintentional conflict with symbols defined at the assembly level. In the C declaration in Example 7(a), a M68HC11 C compiler will generate the code in 7(b). If your assembly routine needs to reference a variable defined in C, you will need to generate an external instruction in your assembly code that includes the underscore.

Example 7: (a) Using this C declaration, an M68HC11 C compiler will generate the code in (b); (b) generated code; (c) a subroutine call to func().

  (a)

  i = 1; / * i is declared as an int */

  (b)

  ldab  #1
  stab _i

  (c)

  jsr  _func

Similarly, a subroutine call to func() will look like Example 7(c) . Therefore, if func() is written in assembly language, you will need to publicly declare it as _func within your file.

If you write your entire application in C, you won't have to worry about parameter passing since the generated code will observe the conventions defined by your compiler. If you plan on writing some of your routines in assembly language, however, you will need to learn what these conventions are and write your code accordingly. Typically, C compilers will pass parameters by pushing them on the stack in reverse order (last parameter pushed first). This is how your assembly program will need to receive them.

If there is only one parameter being passed, it may be passed in a register rather than on the stack. Also, many ANSI compilers with only pass parameters as a multiple of words. This means that variables defined as type char may be widened to type int; compilers may also widen other types, like float to double.

One way you can discover how the compiler passes parameters is to look at the assembly code generated. Since these conventions vary widely from on vendor to another, writing a dummy C program like that in Listing One (page 120) and compiling it to assembly code can save you some time in writing your own assembly programs. Note that if optimization is turned on, the generated code may not be what you expect. If possible, compile without optimization to avoid code from being altered.

Compilers also expect functions that return values to do so according to convention. In Listing One, subroutine returned an int passed in the D register. Other variable types are returned either in registers or on the stack, depending on the size of the return value and the registers available for the specific processor.

Compilers often reserve processor registers for their own use. This is typically done to point to global data areas or for use as an external stack pointer. You will need to verify which registers are reserved by your compiler and avoid using them in your assembly programs. If you must use a reserved register, it is your responsibility to save the value of this register and then restore it before your program returns to the calling program.

Many C compilers offer inline assembly support. This feature allows you to insert a line or more of assembly code within your C program. For example, suppose you wanted to disable interrupts before a call to a subroutine. Typically this would look like Example 8.

Example 8: Using inline assembly to disable interrupts before a call to a subroutine.

  main()
  {
    ...
    _asm("sei\n"); /* disable interrupts */
    func();        /* call function */
    _asm("cli\n"); /* enable interrupts */
    ...
  }

You can write your interrupt vector table in C or assembly language. Generally, you will write a module that defines data locations that will contain the addresses of your interrupt service routines. At link time, the linker will resolve these references and fill in the correct values. Listing Two (page 120) is a C version of a M68HC11 interrupt vector table. This module can be compiled and then located at 0xFFD6 to serve as a generic interrupt vector table and modified later as you write your interrupt service routines.

Finally, because many processors support memory-mapped I/O, it is useful to be able to define variables in C whose location corresponds to a particular I/O register or port. This can be done in C by defining a pointer that references an absolute address. Listing Three (page 120) shows how this is done. The absolute address specified in the #define statement will be substituted for the symbol value.



_MOVING FROM ASSEMBLY TO C_
by Beth Mazur



[LISTING ONE]
<a name="01bf_000f">

;    1  /* Here is a method of determining how the compiler expects parameters
;    2    to be passed between C and assembly routines. For example, suppose
;    3    you need to pass 3 integers to an assembly routine, where they will
;    4    be manipulated, and an integer value returned. The trick is to write
;    5    a C routine that simulates this. Then compile to assembly and see how
;    6    the compiler treats the parameters (you can use the generated
;    7    assembly code as a template and add your code to it). */
;    8  int h,i,j,k;
;    9  main()
;   10  {
_main:
;   11  h = subroutine(i,j,k);
    ldd    _k                    ; push k on stack first
    pshd
    ldd    _j                    ; push j on stack next
    pshd
    ldd    _i                    ; leave i in register
    jbsr   _subroutine
    ins                          ; clear parameters off stack
    ins
    ins
    ins
    std   _h                    ; result in register
;   12  }
    rts
;   13  int subroutine(int a, int b, int c)
;   14  {
_subroutine:
    pshx                        ; save X
    pshd                        ; save i on stack
    tsx                         ; transfer SP to X
    .set     OFST=0
;   15           return(a+b+c);
    ldd      OFST+0,x           ; code to copy i back to D
    addd     OFST+6,x           ; add j
    addd     OFST+8,x           ; add k, result left in D
    pulx                        ; restore D into X from stack
    pulx                        ; restore X
    rts
;   16  }





<a name="01bf_0010">
<a name="01bf_0011">
[LISTING TWO]
<a name="01bf_0011">

extern void reset();    /* symbol defined in startup routine */
extern void default();  /* generic routine generates a return from interrupt */
void (* const vec_tab[])() = {
  default,                /* SCI */
  default,                /* SPI */
  default,                /* Pulse accumulator input edge */
  default,                /* Pulse accumulator overflow */
  default,                /* Timer overflow */
  default,                /* Timer output compare 5 */
  default,                /* Timer output compare 4 */
  default,                /* Timer output compare 3 */
  default,                /* Timer output compare 2 */
  default,                /* Timer output compare 1 */
  default,                /* Timer input capture 3 */
  default,                /* Timer input capture 2 */
  default,                /* Timer input capture 1 */
  default,                /* Real time interrupt */
  default,                /* IRQ */
  default,                /* XIRQ */
  default,                /* SWI */
  default,                /* illegal op-code */
  default,                /* COP watchdog timer fail */
  default,                /* COP monitor clock fail */
  reset,                  /* RESET */
  };





<a name="01bf_0012">
<a name="01bf_0013">
[LISTING THREE]


;   1  /* Here is a method of accessing the M68HC11 I/O ports that is portable
;   2   and can be compiled by most C compilers. c is declared as an int
;   3   because that's how putchar is defined by ANSI. The (char) cast is done
;   4   to avoid "illegal assignment" warnings from strict compilers.    */
;   5  #define SCSR *(char *) 0x102e     /* SCI Status Register */
;   6  #define SCDR *(char *) 0x102f     /* SCI Data Register   */
;   7  #define TDRE 0x80    /* Transmit Data Register Empty bit */
;   8  int putchar(int c)
;   9  {
_putchar:
    pshx
    pshd
    tsx
    .set       OFST=0
L1:            ; line 14, offset 3
;   10         while (!(SCSR & TDRE))

    ldab       102eH
    bitb       #128
    beq        L1
;   11         /* loop until ready to transmit */
;   12         SCDR = (char) c;
    ldab       OFST+1,x
    stab       102fH
;   13         return(c);
    ldd        OFST+0,x
    pulx
    pulx
    rts
;   14  }
    .public    _putchar
    .end



Copyright © 1992, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.