Efficient MC68HC08 Programming

Rand and Deepak examine two basic optimizations for the Motorola MC68HC08 microcontroller: common-subexpression elimination and constant-value propagation.


May 01, 1995
URL:http://www.drdobbs.com/mobile/efficient-mc68hc08-programming/184409556

Figure 1


Copyright © 1995, Dr. Dobb's Journal

Figure 2


Copyright © 1995, Dr. Dobb's Journal

Figure 1


Copyright © 1995, Dr. Dobb's Journal

Figure 2


Copyright © 1995, Dr. Dobb's Journal

MAY95: Efficient MC68HC08 Programming

Efficient MC68HC08 Programming

Reducing cycle count and improving code density

Rand Gray and Deepak Mulchandani

Rand and Deepak develop embedded-system development tools for Motorola's microcontroller technologies group in Austin, Texas. Deepak can be reached at [email protected]. Rand can be reached at [email protected].


Although high-level languages such as C have become increasingly popular in embedded-application development, assembly-language implementations can still improve code density and reduce cycle counts. The downside is that such implementations are prone to a wider variety of development errors, more difficult to read and maintain, essentially nonportable, and generally more expensive in the overall development effort.

Implementations in C, on the other hand, provide improved portability and are easier to maintain and realize. In this article, we introduce some aspects of embedded programming using the Motorola MC68HC08 microcontroller and present a portion of the CPU08 instruction set. We also provide some sample programs written in C to expose the weaknesses of code generated by a typical C compiler. Although a compiler might support many optimizations, we'll examine two basic ones: common-subexpression elimination and constant-value propagation. Our examples will demonstrate how compilers that do not support simple techniques can incur penalties in terms of code density and cycle count.

The MC68HC08

The MC68HC08 series of microcontrollers is based around the CPU08 central processing unit, which can be combined with a selection of peripherals, such as a serial communications interface (SCI), serial peripheral interface (SPI), timers, PWMs, A/D converters, RAMs, masked ROMs, EPROMs, and EEPROMs. Although there are currently just a few MC68HC08 devices, Motorola plans to proliferate them, as has been done with the MC68HC05 family, which currently has around 160 derivatives.

The CPU08 core allows the designed MCU to be used in a wide variety of applications. The 68HC08 MCUs will be a good match to many embedded-systems applications, including stepper-motor control, general-purpose industrial controls, pagers, HVAC controls, computer peripherals (printers, disk drives, keyboards, mice, and trackballs), electronic appliances (TVs, cameras, camcorders, and radios), household appliances, and security systems. The modularity of the CPU08 core along with the capability to integrate it with a number of predesigned modules makes possible the design of a custom microcontroller for your application.

The architecture of the CPU08 consists of an accumulator-based processor with index registers available for data-access functions. Operands are loaded from memory and operated upon, and results are written back to memory. The CPU08 instruction set and architecture provide fairly comprehensive support for an ANSI C compiler implementation. The CPU08 also provides features such as direct page-addressing instructions, which allow users to manipulate data utilizing less instruction bytes, and thus saving program space. Table 1 provides a summary of the CPU08 programming registers.

The addressing modes of an instruction set determine how effectively data can be manipulated using the operations provided. The 68HC08 provides 16 addressing modes for flexibility in data access. Table 2 outlines some of the major addressing modes. It is always important to utilize all addressing modes to maximize data-access efficiency.

Figure 1 summarizes the memory map of the MC68HC08 XL36 MCU. The areas of the memory map marked "unused" indicate sections of the MCU irrelevant to this article. The memory map for each 68HC08 MCU is different: The data book available for each 68HC08-based MCU describes the memory map for that particular MCU. Extended-page addressing allows the user to access data throughout the memory map of the MCU; that is, from $0000 to $FFFF, since the MCU supports 16-bit addressing.

The stack-pointer-relative addressing modes are useful for function calls and temporary allocation of data. The CPU08 stack pointer can be relocated by the user under program control to point anywhere in RAM (LDHX and TXS instructions). If the stack pointer is set to point to a high section of RAM, then frequently accessed variables can be stored in the direct page-addressing mode. Instructions are provided to allow storage of the CPU Registers on the stack (PSHA, PSHX, PSHH), and their complementary operations allow the retrieval of their value from the stack (PULA, PULX, PULH). The AIS instruction allows the program to allocate temporary storage on the stack. Stack-pointer instructions and addressing modes provide extensive support for implementation of C-style function calls.

Bit Addressing and Manipulation

Memory is a scarce resource on any microcontroller. Often variables are used for flags. However, these variables have just two possible values such as On/Off or True/False. Rather than use an entire byte or word of memory to store this value, it's more efficient to represent these values using a single bit (0 or 1). Devices and peripherals also require extensive bit addressing and manipulation. Table 3 lists the CPU08 instructions for bit manipulation.

Because the CPU08 is a memory-mapped I/O processor, all supported devices can be accessed using the instruction set of the CPU by manipulating preassigned memory locations, which, on all processors, are all located in the first 256 bytes of the processor's memory map. Therefore, the program can use the fast, direct-page addressing modes to access and write data to and from the peripherals.

The bit-addressing modes of the CPU08 can be used to service devices quickly and efficiently. As an example, consider manipulating the operation of the PORT A I/O device on the XL36 MCU. PORT A is an 8-bit, general-purpose, bidirectional I/O port; see Figure 2. It is assigned to address $0000 of the memory map of the XL36 MCU. The data direction register A (DDRA) can be used to determine whether each port A pin is an input or an output.

Setting the DDRA bit to 1 enables the output buffer for the corresponding port A pin. The BSET instruction can be used to set bit number 4 of the data direction register A. The BRCLR or BRSET instructions can then be used to check the bit and perform an operation based on the result. Without these instructions, the normal flow would be: load value of memory location $0000 in the accumulator; check the value of the specific value (requires multiple instructions); and store the manipulated value back (if required).

Interrupt Handling

In many embedded applications, it is not necessary for the MCU to be running at all times; it may be "put to sleep" in its STOP state, which conserves power. An event in the system causes an interrupt to wake up the processor. The MCU then processes the event and reenters the STOP state upon completion of its task. In real-time systems, interrupts play a vital role, and it is important to be able to process each interrupt event quickly enough that no event is missed. "Interrupt latency," the amount of time which elapses between the occurrence of the interrupting event and the execution of the first instruction in the interrupt service routine (ISR), is important in all applications that depend on the use of interrupts.

An interrupt causes program execution to be suspended and processing to be transferred to the ISR. The MCU saves the CPU register state on the stack so that upon returning from the ISR, the normal program flow continues with no disturbance.

Code-Generation Issues

In embedded development, it is critical that a C compiler generate efficient and optimal code for applications. Unfortunately, a C compiler rarely generates code that is as efficient as hand-coded assembly language. Here we examine how a typical C compiler optimizes and generates code for two common situations: common-subexpression elimination and constant-value propagation. Table 4 introduces CPU08 instructions that you'll find useful when examining these examples. Note that many of these instructions support multiple addressing modes. The CPU08 reference manual describes all the instructions and their corresponding addressing modes.

Common-subexpression elimination is an optimization performed by the compiler to get rid of redundant expressions calculating the same value. The compiler avoids repeated calculation of the same expression by trying to keep the result in a temporary variable. As shown in Example 1, the value of the variable y can be determined to be 7. The compiler can then save the value of y to its new value 7, calculate the value of the expression 2*y+1 (which equals 15), move the value 15 into the variable x, and move the value 15 into the variable z. Optimal assembly code for this program might look something like Example 2 (other variations are possible). As shown, the byte count for the hand-coded assembly is 15 bytes, and the cycle count for the instructions is 21 cycles. The instructions in the compiler output use 46 CPU cycles; see Example 3. The code density is 36 bytes, an increase of 21 bytes over the hand-coded assembly language.

Constant-value propagation is a technique used by a compiler whereby a reference to a variable with known contents is replaced by those contents. Example 4 demonstrates how a compiler, by failing to perform a simple test, generates useless instructions and consumes unnecessary bytes. This situation results in dead code (code never executed) in ROM. As shown, the compiler can determine that the value of variable x is 6 at line 3 of the program. When the comparison "is x equal to 7" is made, the compiler can determine that the value of this comparison is False. As a result, the compiler incorrectly generates instructions for the comparison; see Example 6. Example 5 shows the equivalent function hand assembled. The useless instructions take up 30 bytes of ROM and require a total of 40 CPU clock cycles to execute.

Function Calls

Compilers use the stack to provide temporary scratch storage for data while the application executes. When translating function calls into assembly language, the compiler will normally use the stack to pass parameters, allocating storage for function-defined local variables and the return value. This approach is feasible on microprocessor systems such as the MC68000, where the stack space is extensive. On those processors, the C compiler can generate stack frames for multiple levels of function calls. However, when writing applications for an MCU that doesn't provide much stack space, compilers can get into a real bind if they do not properly utilize memory resources such as the stack.

There are two standard ways to make function calls; which one is used usually depends upon how the generating compiler decides to implement function calls. The first method uses the PSHA, PSHX, and PSHH instructions to save the state of the CPU registers, makes the call to the function using JSR, allocates temporary space on the stack using the AIS instruction, executes the function call, and returns and cleans up the stack using the AIS instruction. Notice that this method of calling functions is extremely stack intensive. On systems where the RAM resources (including stack) might be a few hundred bytes, this is definitely not the way to go if the application is modular and flow of control passes through many functions.

The second method is to simply save the program counter and make the function call. Code-generation techniques that do not depend on the state of the CPU registers do not have to save them on the stack every time they make a function call. Function calls are then reduced to a regular group of instructions that require no special setup to execute. This method might be preferred since the only instruction required to set up the flow of control between functions is the BSR instruction, which only saves the value of the program counter on the stack.

As you can see, efficient and optimal assembly-language generation is not always guaranteed when using high-level languages. In fact, the compiler must have a tremendous grasp of the MCU architecture it supports to maximize the instruction set's potential.

Quick benchmarking techniques allow developers to understand the toolset with which they are developing. Understanding what your compiler does when it encounters a specific construct can indicate what the compiler might do if you use that construct frequently. Some other areas where quick benchmarking can aid the developer are: logical expressions (check for short-circuit evaluation), arithmetic expressions, flow control, and loops. In particular, look for optimizations such as invariant expression removal and the use of loop-aiding instructions such as CBEQ and DBNZ.

Certain addressing modes are usually provided on MCUs to enable the developer to perform an operation efficiently. When generating code for accumulator-based load-store MCUs, compilers sometimes get carried away with generating load-store instructions and hence do not pay attention to certain addressing modes. Checking the generated assembly language for load-store instructions and MCU-provided addressing modes might also help reduce cycle count and improve code density in your application.

Conclusion

High-level languages free you from such tasks as keeping track of variable addresses and program partitioning. However, compilers do not always generate optimal code for an MCU's instruction set. Minimizing code density is important since useless and redundant segments of code will exhaust all available ROM and RAM resources. Furthermore, reducing cycle count is important since it reduces power consumption, especially in battery-powered applications.

Figure 1 Memory map of a 68HC08 XL36 MCU. Figure 2 PORT A I/O device on the XL36 MCU.

Table 1: CPU08 programming-registers summary.

Name   Size     Description                                          
       (Bits)   
A      8        Accumulator. Used to perform arithmetic
                operations upon data.
CCR    8        Condition code register. Maintains status
                bits on results of previously executed operations.
H      8        Extension index register. Can be used in
                conjunction with X for 16-bit addressing.
PC     16       Program counter. Always points to the instruction
                to be executed.
SP     16       Stack pointer. Can be used for function parameter
                passing and temporary space allocation for data.
X      8        Index register. Can be used to access data at specific
                locations or temporary storage for operands.

Table 2: 68HC08 major addressing modes.

Mnemonic                Description                                           
DIRECT                  Quick data manipulation within the first 256
                        bytes of the MCU memory map.
EXTENDED                Data manipulation within the entire 64-Kbyte MCU
                         memory map.
INDEX POINTER RELATIVE  Data manipulation with an offset from the X index
                         register.
INHERENT                Single-byte instructions that have
                        no associated operand fetch.
IMMEDIATE               Data operand contained in the
                        bytes following the instruction opcode.
STACK POINTER RELATIVE  Data manipulation with an offset from the stack
                         pointer.

Table 3: CPU08 instructions for bit manipulation.

    Mnemonic   Description                                       
    BSET       Bit set.
    BCLR       Bit clear.
    BRSET      Branch if specified bit is set (the bit is 1).
    BRCLR      Branch if specified bit is clear (the bit is 0).

Table 4: Subset of CPU08 instructions.

    Mnemonic   Addressing mode(s)           Description                           
    AIS        Immediate                    Add immediate value to stack
                                             pointer.
    CLR        Direct                       Clear memory location.
    CLRA       Inherent                     Clear accumulator.
    CLRH       Inherent                     Clear extension index register
                                             H.
    CLRX       Inherent                     Clear index register X.
    LDA        Direct, Immediate, Indexed   Load accumulator from memory;
                                             A <-- (M).
    ROLX       Inherent                     Least-significant bit = - Carry bit
                                             Carry bit = most-significant bit.
    STA        Direct, Indexed              Store accumulator in memory;
                                             (M) <-- A
    STX        Direct, Indexed              Store index register X in memory; 
                                             (M) <-- X

Example 1: Code which has a common subexpression.

 1  main()
 2  {
 3    int x;
 4    int y;
 5    int z;
 6
 7    y = 7;
 8    x = 2*y+1;
 9    z = 2*y+1;
10
11  }

Example 2: Optimal assembly code for Example 1.

Bytes  Cycles  Instructions
3      4       mov #7, y+1
2      3       clr y
3      4       mov #15, x+1
2      3       clr x
3      4       mov #15, z+1
2      3       clr z

Example 3: Density and cycle-count analysis for compiler-generated code.

Bytes Cycles  Compiler-generated output
3     4       mov #7, y+1
2     3       clr y
2     3       ldx y
2     3       lda y+1
1     1       lsla
1     1       rolx
2     2       add #1
2     3       sta x+1
1     1       txa
2     2       adc #
2     3       sta x
2     3       ldx y
2     3       lda y+1
1     1       lsla
1     1       rolx
2     2       add #1
2     3       sta z+1
1     1       txa
2     2       adc #0
2     3       sta z
1     1       rts

Example 4: Program to demonstrate constant-value propagation.

 1  main()
 2  {
 3     int x = 6;
 4     int y;
 5
 6     if (x == 7)
 7     {
 8        y = 2*3+1;
 9        x = 2*y+1
10     }
11     y = x;
12
13  }

Example 5: Optimal assembly code for Example 4.

Bytes  Cycles  Instructions
3      4       mov #6, x+1
2      3       clr x
3      5       mov x, y
3      5       mov x+1, y+1
1      1       rts

Example 6: Code density and cycle-count analysis of Example 4.

Bytes  Cycles  Compiler-generated output
3      4       mov #6, x+1
2      3       clr x
2      3       lda x+1
2      2       eor #7
2      3       bne L0005
2      3       lda x
2      3       bne L0005
3      4       mov #7, y+1
2      3       clr y
2      3       ldx y
2      3       lda y+1
1      1       lsla
1      1       rolx
2      2       add #1
2      3       sta x+1
1      1       txa
2      2       adc #0
2      3       sta x
L0005:
3      5       mov x, y
3      5       mov x+1,y+1
1      1       rts


Copyright © 1995, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.