C/C++

The Embedded C Extension to C: Part I

By Marcel Beemster, Hans van Someren, Willem Wakker and and Walter Banks, August 01, 2005

In the first installment of this two-part article, we examine the Embedded C specification that gives you direct access to features in target processors.

High-level language programming has long been in use for embedded-systems development. However, assembly programming still prevails, particularly for digital-signal processor (DSP) based systems. DSPs are often programmed in assembly language by programmers who know the processor architecture inside out. The key motivation for this practice is performance, despite the disadvantages of assembly programming when compared to high-level language programming.

Performance is key to signal-processing applications because it directly translates into end-user features. A 10-percent lower clock speed generally results in a corresponding reduction in power consumption. With more effective code generation, an application needs less processing cycles and thus a lower clock speed, which results in less EMI, longer battery life, and less heat generation. If the video decoding takes 80 percent of the CPU-cycle budget instead of 90 percent, for instance, there are twice as many cycles available for audio processing. This coupling of performance to end-user features is characteristic of many of the real-time applications in which DSP processors are applied.

Embedded C is not part of the C language as such. Rather, it is a C language extension that is the subject of a technical report by the ISO working group named "Extensions for the Programming Language C to Support Embedded Processors" [3]. It aims to provide portability and access to common performance-increasing features of processors used in the domain of DSP and embedded processing. The Embedded C specification for fixed-point, named address spaces, and named registers gives the programmer direct access to features in the target processor, thereby significantly improving the performance of applications. The hardware I/O extension is a portability feature of Embedded C. Its goal is to allow easy porting of device-driver code between systems. In this article, we focus on the performance-improving features of Embedded C.

DSPs have a highly specialized architecture to achieve the performance requirements for signal processing applications within the limits of cost and power consumption set for consumer applications. Unlike a conventional Load-Store (RISC) architecture, DSPs have a data path with memory-access units that directly feed into the arithmetic units. Address registers are taken out of the general-purpose register file and placed next to the memory units in a separate register file.

A further specialization of the data path is the coupling of multiplication and addition to form a single cycle Multiply-ACcumulate unit (MAC). It is combined with special-purpose accumulator registers, which are separate from the general-purpose registers.

Data memory is segmented and placed close to the MAC to achieve the high bandwidths required to keep up with the streamlined data path. Limits are often placed on the extent of memory-addressing operations. The localization of resources in the data path saves many data movements that typically take place in a Load-Store architecture.

The most important, common arithmetic extension to DSP architectures is the handling of saturated fixed-point operations by the arithmetic unit. Fixed-point arithmetic can be implemented with little additional cost over integer arithmetic. Automatic saturation (or clipping) significantly reduces the number of control-flow instructions needed for checking overflow explicitly in the program.

Figure 1 shows a typical DSP architecture. It shows the extended data path giving direct access to X and Y memory using X and Y addressing units (X/YAU). The addressing units have their own address registers (X/YAR) to implement address postincrement operations without having to access the general-purpose registers (GPR). The addressing units are further specialized to implement circular buffer access, which is useful for implementing sliding windows over signal data.

Figure 1: Example DSP processor architecture with dual-input memory data path and MAC unit.

DSP architectures are not easy to program optimally, either by hand or with compilers. Manual assembly programming is awkward because of the nonorthogonality of the architecture and arbitrary restrictions that can be in place. Modern compilers can deal with nonorthogonality reasonably well, but are not good at exploiting the special features that DSP processors have in place.

Current state-of-the-art embedded applications (mobile phones, for example) are implemented using two processors. One processor is a low-power RISC processor that takes care of all control processing, user interaction, and display management. It is programmed in a high-level language using an SDK that includes a compiler. The other processor is a DSP, which takes care of all of the signal processing. The signal-processing algorithms are typically hand-coded in assembly.

Changes in technological and economic requirements make it more expensive to continue programming DSPs in assembly. Staying with the mobile phone as an example, the signal-processing algorithms required become increasingly complex. Features such as stronger error correction and encryption must be added. Communication protocols become more sophisticated and require much more code to implement. In certain markets, multiple protocol stacks are implemented to be compatible with multiple service providers. In addition, backward compatibility with older protocols is needed to stay synchronized with provider networks that are in a slow process of upgrading.

On the economic side, time-to-market for new technology puts increasing pressure on design time. In 2004, the number of mobile phones sold worldwide is in the order of 500 million. In the western world, the time-to-replacement for mobile phones is between one and two years, and is driven by new features and fashion. Staying ahead in this market requires extremely fast and streamlined design projects. Assembly programming has no place in this world. Assembly code has limited support for data types and generally no support for data structures. When assembly code is used, this lack of abstraction becomes a design restriction that impacts not only the interface to the assembly code but often the application as a whole. Assembly programs are difficult to maintain and make a company dependent on a few specialists. By definition, assembly programs are nonportable. Legacy code makes it extremely expensive to switch to a new technology. These dependencies make a company vulnerable to employee and supplier chain changes.

Today, most embedded processors are offered with C compilers. Despite this, programming DSPs is still done in assembly for the signal processing parts or, at best, by using assembly-written libraries supplied by manufacturers. The key reason for this is that although the architecture is well matched to the requirements of the signal-processing application, there is no way to express the algorithms efficiently and in a natural way in Standard C. Saturated arithmetic, for example, is required in many algorithms and is supplied as a primitive in many DSPs. However, there is no such primitive in Standard C. To express saturated arithmetic in C requires comparisons, conditional statements, and correcting assignments. Instead of using a primitive, the operation is spread over a number of statements that are difficult to recognize as a single primitive by a compiler.

Enter Embedded C

Embedded C is designed to bridge the performance mismatch between Standard C and the embedded hardware and application architecture. It extends the C language with the primitives that are needed by signal-processing applications and that are commonly provided by DSP processors. The design of the support for fixed-point data types and named address spaces in Embedded C is based on DSP-C. DSP-C [1] is an industry-designed extension of C with which experience was gained since 1998 by various DSP manufacturers in their compilers. For the development of DSP-C by ACE (the company three of us work for), cooperation was sought with embedded-application designers and DSP manufacturers.

The Embedded C specification extends the C language to support freestanding embedded processors in exploiting the multiple address space functionality, user-defined named address spaces, and direct access to processor and I/O registers. These features are common for the small, embedded processors used in most consumer products. Previously, it was common practice that each of the tool providers supported these features using functionally similar, but syntactically different, implementations. For the Embedded C specifications, the functionality from the various tool providers was used and a common, extensible syntax was defined. Specific embedded-systems deficiencies in C have been addressed to reduce application dependence on assembly code.

Embedded C makes life easier for application programmers. The primitives provided are the primitives that fit the conceptual model of the application. The Embedded C extensions to C unlock the high-performance features of embedded processors for C programmers. The Embedded C specification brings back the roots of C to embedded systems as primarily a high-level language means of accessing the processor. Assembly programming is no longer required for a vast body of performance-critical code. Maintainability and portability of code are the key winners in this process.

Embedded C Features

The features introduced by Embedded C are fixed-point and saturated arithmetic, segmented memory spaces, and hardware I/O addressing. The description we present here addresses the extensions from a language-design perspective, as opposed to the programmer or processor architecture perspective. For the details and language definition of Embedded C, see [2].

Embedded C adds two new primitive types, _Fract and _Accum, and one type qualifier named _Sat. The underscores are included in these new keywords to ensure compatibility with existing applications. Typically, an implementation provides the more convenient macros fract, accum, and sat in the include file <stdfix.h>.

The _Fract type offers fixed-point data types that have a value range of [-1.0, +1.0> (-1.0 is included but not +1.0). This is conveniently implemented using the two-complement arithmetic typically used for integer arithmetic. In two-complement notation, the dot of the fixed-point value is imagined right after the sign bit, before all value bits. The first value bit represents 0.5, the second 0.25, and so on. A fixed-point number has no integer part. Embedded C does not specify the exact accuracy of the fixed-point types, although a minimum accuracy is defined with which an implementation must comply. The _Fract type can be qualified with the existing qualifiers short and long to define three different fixed-point types. The range of these types is the same, [-1.0, +1.0>, but the accuracy should be equal or get better when moving from short _Fract to _Fract to long _Fract.

The _Accum is also a fixed-point type and can also be qualified with short and long. The three resulting _Accum types must match the three _Fract types in terms of accuracy (the number of bits in the fraction). Additionally, the _Accum types have an integer part in their value. So, the range of an _Accum value may be [-256.0, +256.0>. Again, the number of integer bits is not specified in the Embedded C definition. The _Accum types match the accumulator registers of typical DSPs. The aim of these registers is to keep intermediate arithmetic results without having to worry about overflow.

The _Sat qualifier can be applied to fixed-point types. It makes sure that all operations with operands of _Sat-qualified type are saturated. It does not change the storage representation. Saturation means that if overflow occurs in an operation, the result will be set to the upper bound or lower bound of the type. For example, computing -0.75 + -0.75 results in -1.0 under saturated fixed-point arithmetic. Saturated arithmetic is important for signal-processing applications because they often operate close to the boundaries of the arithmetic domain to get the best signal-to-noise ratio. This is unlike integer processing in C, which is usually considered "large enough" and needs bound checks only at specific places.

The unsigned qualifier (already existing in Standard C) can also be applied to the fixed-point types, providing arithmetic domains starting from 0.0. The range of the _Fract type becomes [0.0, 1.0>. Unsigned arithmetic is typically used in image-processing applications, but it is not universally present on all DSP processors.

The arithmetic operations for _Fract and _Accum include all those defined for the int type, but exclude ~, &, |, and ^.

Within the fixed-point hierarchy, the usual implicit conversions are defined. For example, the promotion of _Fract to unsigned _Fract or long _Fract is automatic; unsigned _Fract can be promoted to _Accum, with similar promotions for the long qualified variants. Implicit conversions between fixed-point types and other types are fully defined. It is possible to write mixed-type expressions. Conversions in mixed expressions are based on the rank order, which is int, _Fract, _Accum, float. Some extensions were made to the usual handling of mixed arithmetic, in particular to make an expression such as 3 * 0.1r (where r denotes a fixed-point constant) meaningful. Under the usual arithmetic rules, the value 3 has to be converted to a fixed-point value first, which is out of range and would lead to a meaningless result. With the extended rules, the intended outcome of 0.3r is obtained.

An alternative to the current choice in the fixed-point design is to let programmers specify exactly the number of relevant bits of the fixed-point types, or even to let programmers specify the number of bits for every fixed-point variable. In this way, the implementation could guarantee the outcome of the computations. Such a design would raise the abstraction level of the Embedded C language and increase the portability of code. However, it would also completely bypass the rationale of Embedded C, which is to provide a good match between the language and the performance-increasing features of the processor. Enforcing an implementation of Embedded C to implement, for example, a 40-bit _Accum type on a processor that offers only 24-bit accumulators, is extremely awkward and would be highly inefficient. In that case, Embedded C would be unusable for its purpose, which is to provide the programmer with access to the high-performance features of the processor.

Multiple Address Spaces

Embedded C supports the multiple address spaces found in most embedded systems. It provides a formal mechanism for C applications to directly access (or map onto) those individual processor instructions that are designed for optimal memory access. Named address spaces use a single, simple approach to grouping memory locations into functional groups to support MAC buffers in DSP applications, physical separate memory spaces, direct access to processor registers, and user-defined address spaces.

The Embedded C extension supports defining both the natural multiple address space built into a processor's architecture and the application-specific address space that can help define the solution to a problem.

Embedded C uses address space qualifiers to identify specific memory spaces in variable declarations. There are no predefined keywords for this, as the actual memory segmentation is left to the implementation. As an example, assume that X and Y are memory qualifiers. The definition:

X int a[25] ;

means that a is an array of 25 integers, which is located in the X memory. Similarly (but less common):

X int * Y p ;

means that the pointer p is stored in the Y memory. This pointer points to integer data that is located in the X memory. If no memory qualifiers are used, the data is stored into unqualified memory.

For proper integration with the C language, a memory structure is specified, where the unqualified memory encompasses all other memories. All unqualified pointers are pointers into this unqualified memory. The unqualified memory abstraction is needed to keep the compatibility of the void * type, the NULL pointer, and to avoid duplication of all library code that accesses memory through pointers that are passed as parameters.

Each address space other than the generic one has a unique name in the form of an identifier. Address space names are ordinary identifiers, sharing the same name space as variables and typedef names. Address space identifiers follow the same rules for scope as other identifiers. A compiler implementation may provide an implementation-defined set of intrinsic address spaces. The names of intrinsic address spaces are reserved identifiers beginning with an underscore and an uppercase letter or with two underscores. A compiler implementation may also support a means for new address space names to be defined within applications.

Named Registers

Embedded C allows direct access to processor registers that are not addressable in any of the machine's address spaces. The processor registers are defined by the compiler-specific, named-register, storage class for each supported processor. The processor registers are declared and used like conventional C variables (in many cases volatile variables). Developers using Embedded C can now develop their applications, including direct access to the condition code register and other processor-specific status flags, in a high-level language, instead of inline assembly code; see Listing One.

Listing One


register _CC volatile struct {
     int is_IRQ 	: 1;
     int disable_FIRQ 	: 1;
     int half_carry 	: 1;
     int disable_IRQ 	: 1;
     int negative 	: 1;
     int zero 	: 1;
     int overflow 	: 1;
     int carry 	: 1;
  } Cond_reg;

Named address spaces and full processor access reduces application dependency on assembly code and shifts the responsibility for computing data types, array and structure offsets, and all those things that C compilers routinely and easily do from developers to compilers.

I/O Hardware Addressing

The motivation to include primitives for I/O hardware addressing in Embedded C is to improve the portability of device-driver code. In principle, a hardware device driver should only be concerned with the device itself. The driver operates on the device through device registers, which are device specific.

However, the method to access these registers can be very different on different systems, even though it is the same device that is connected. The I/O hardware access primitives aim to create a layer that abstracts the system-specific access method from the device that is accessed. The ultimate goal is to allow source-code portability of device drivers between different systems.

In the design of the I/O hardware-addressing interface, three requirements needed to be fulfilled:

The device-drive source code must be portable.
The interface must not prevent implementations from producing machine code that is as efficient as other methods.
The design should permit encapsulation of the system-dependent access method.

The design is based on a small collection of functions that are specified in the <iohw.h> include file. These interfaces are divided into two groups; one group provides access to the device, and the second group maintains the access method abstraction itself.

To access the device, the following functions are defined by Embedded C:

unsigned int iord(  ioreg_designator );
void iowr(  ioreg_designator, unsigned int value );
void ioor(  ioreg_designator, unsigned int value );
void ioand( ioreg_designator, unsigned int value );
void ioxor( ioreg_designator, unsigned int value );

These interfaces provide read/write access to device registers, as well as typical methods for setting/resetting individual bits. Variants of these functions are defined (with buf appended to the names) to access arrays of registers. Variants are also defined (with l appended) to operate with long values.

All of these interfaces take an I/O register designator ioreg_designator as one of the arguments. These register designators are an abstraction of the real registers provided by the system implementation and hide the access method from the driver source code.

Three functions are defined for managing the I/O register designators. Although these are abstract entities for the device driver, the driver does have the obligation to initialize and release the access methods. These functions do not access or initialize the device itself because that is the task of the driver. They allow, for example, the operating system to provide a memory mapping of the device in the user address space.

void iogroup_acquire( iogrp_designator );
void iogroup_release( iogrp_designator );
void iogroup_map( iogrp_designator, iogrp_designator );

The iogrp_designator specifies a logical group of I/O register designators; typically this will be all the registers of one device. Like the I/O register designator, the I/O group designator is an identifier or macro that is provided by the system implementation.

The map variant allows cloning of an access method when one device driver is to be used to access multiple identical devices.

Embedded C Portability

By design, a number of properties in Embedded C are left implementation defined. This implies that the portability of Embedded C programs is not always guaranteed. Embedded C provides access to the performance features of DSPs. As not all processors are equal, not all Embedded C implementations can be equal.

For example, suppose an application requires 24-bit fixed-point arithmetic and an Embedded C implementation provides only 16 bits because that is the native size of the processor. When the algorithm is expressed in Embedded C, it will not produce outputs of the right precision. In such a case, there is a mismatch between the requirements of the application and the capabilities of the processor. Under no circumstances, including the use of assembly, will the algorithm run efficiently on such a processor. Embedded C cannot overcome such discrepancies.

Yet, Embedded C provides a great improvement in the portability and software engineering of embedded applications. Despite many differences between performance-specific processors, there is a remarkable similarity in the special-purpose features that they provide to speed up applications.

The contribution of Embedded C is that it standardizes the notation to access these features. By adding this to the C language, it is now feasible to write high-performance code for embedded and DSP processors in a high-level language. Using Embedded C, it is possible to map C code to almost any machine instruction; this "assembler in C" capability was not a goal but reflects the effectiveness of the extensions. The responsibility of programmers to do the laborious resource-planning tasks that assembly language programming requires is taken away.

Writing C code with the low-level processor-specific support may at first appear to have many of the portability problems usually associated with assembly code. In the limited experience with porting applications that use Embedded C extensions, an automotive engine controller application (about 8000 lines of source) was ported from the eTPU, a 24-bit special-purpose processor, to a general-purpose 8-bit Freescale 68S08 with about a screenful of definitions put into a single header file. The porting process was much easier than expected. For example, variables that had been implemented on the processor registers were ported to unqualified memory in the general-purpose microprocessor by changing the definitions in the header definition and without any actual code modifications. The exercise was to identify the porting issues and it is clear that the performance of the special-purpose processor is significantly higher than the general-purpose target.

C++ Compatibility

C++ compatibility was a topic of debate in the design of Embedded C. Preferably, the extension should be expressed in such a way that it can be implemented as C++ classes. It implies that the extension should not depend on the use of type qualifiers. This, however, is against the "spirit of C" and leads to a long list of types that result when all combinations of qualifiers are expanded. While this would be feasible for the fixed-point types, it would still not provide a solution for the named address space qualifiers.

Hence the current design that follows standard C practice. In the case code must be written that is to be accepted by both a C and C++ compiler, you need to define macros such as unsigned_long_accum, which should then expand into the right pattern for the specific compiler. For named address spaces, there is no similar solution yet because these do not commonly appear in C++ compilers.

Next Month

In the next installment, we present an example of a FIR filter that includes the fixed-point and address space features of Embedded C.

References

[1] ACE. DSP-C, an extension to ISO/IEC IS 9899:1990. Technical Report CoSy-8025P-dsp-c, ACE Associated Compiler Experts bv, Amsterdam, The Netherlands, 1998. (http://www.dsp-c.org/).
[2] JTC1/SC22/WG14. Programming languages - C - Extensions to support embedded processors. Technical report, ISO/IEC, 2004. (http://www.open-std.org/jtc1/sc22/wg14).

Marcel, Hans, and Willem are software engineers for ACE Associated Compiler Experts. They can be contacted at marcel@ ace.nl, [email protected], and willem@ ace.nl, respectively. Walter is the founder of Byte Craft Limited, and can be contacted at [email protected].

1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

The Embedded C Extension to C: Part I

Figure 1: Example DSP processor architecture with dual-input memory data path and MAC unit.

Enter Embedded C

Embedded C Features

Multiple Address Spaces

Named Registers

Listing One

I/O Hardware Addressing

Embedded C Portability

C++ Compatibility

Next Month

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

The Embedded C Extension to C: Part I

Figure 1: Example DSP processor architecture with dual-input memory data path and MAC unit.

Enter Embedded C

Embedded C Features

Multiple Address Spaces

Named Registers

Listing One

I/O Hardware Addressing

Embedded C Portability

C++ Compatibility

Next Month

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content