Database

Portability In C

James Metzger and William Wright

, October 01, 2000

James and William present techniques they've developed to achieve portability of a real-time signal processing software system consisting of over 300,000 lines of C code.

Oct00: Portability In C

James and William are developers for BBN Technologies/GTE. They can be contacted at [email protected] and wwright@bbn .com, respectively.

One of the features of C that has made it popular is its portability. Compilers for C were developed for most major platforms. Cross compilers were also developed to support a variety of architectures, including general-purpose and digital-signal processors (DSP). But as anyone who has attempted to port C code soon discovers, portability isn't so straightforward. Interfaces between the operating system and application are usually different for each environment, requiring many changes in the code. These changes to support a variety of architectures can accumulate over time, making maintenance of the system difficult.

In this article, we'll illustrate techniques we developed to achieve portability of a real-time signal-processing system consisting of over 300,000 lines of C code. The system consists of an executive component, signal-processing module library, and some auxiliary utilities. Although this application was originally hosted on multiprocessor VME platforms running VxWorks, the principles have also been applied to UNIX varieties, Windows NT, and other architectures.

Fundamentally, achieving portability amounts to separating the Standard C language components from those that vary with the operating system, processor, or vendor library. The separation methods we explore involve the following techniques:

Effective use of the C preprocessor (compile-time separation).
Using separate linkage for portability (link-time separation).
Resolving target dependencies during execution (run-time separation).

The C Preprocessor

The C preprocessor is a key tool for targeting code to multiple platforms. Preprocessor directives can control which lines of code are compiled for which target configuration. Preprocessor directives are commonly used to define constants, but can also be used to include certain sections of code under particular conditions. When used this way, directives are usually defined with options given to the compiler.

For instance, the function double dtime(); returns a double-precision floating-point value representing the number of milliseconds elapsed since the last time the function was invoked. Such a function is common in real-time systems where measuring software execution speed is important. However, the method for measuring time in such applications varies dramatically, depending on whether it involves counting clock cycles using an external timing board or acquiring system time maintained by a full-featured OS. Consequently, variants of this utility may be defined for different targets; for example, see Listing One(a). The utility function is invoked in main.c like Listing One(b). Selection of the desired portion of code can be done at compile time, as in Listing One(c) where the -D switch of the GNU C compiler defines a preprocessor directive called "SHARC."

We've found it essential to choose preprocessor symbol names carefully. The names should correspond to hardware or software (operating system) targets. In Listing One, the hardware target was a SHARC DSP. A more subtle case is when the software must support an operating system on more than one hardware platform or more than one operating system on the same hardware platform. In these cases, you should define directives for the hardware and software environments separately. For instance, take the case of Listing One(d), which is code for the SHARC DSP or VxWorks on two hardware platforms. It's easy to see how this approach can become unmaintainable. As the number of targets increases, these blocks of code get longer and more difficult to understand. Code that is hard to understand is hard to maintain because the effects of changes are not easily seen. Sometimes this conditional code becomes so complex that it's not even clear which code is being compiled for which target. Also, any changes in this code may introduce bugs in one target without breaking all of the other targets. A regression test must then be run on all targets to be sure the code still works after any modification.

There are some techniques that can make #ifdefs more maintainable. The most important thing is to carefully define and document what the preprocessor symbols are meant to signify. Listing Two(a) lists some of the symbols that we have used. So code specific to VxWorks on a Pentium would be contained by Listing Two(b).

Most compilers have some helpful directives that are defined automatically and correspond to the target environment. The GNU gcc compiler has a flag (-v) to make it print out detailed information including the implicit directives. Listing Two(c), an example of its output, shows the directives defined for a SPARC Solaris target. The compiler defines "unix" corresponding to the operating system and "sparc" representing the target processor. These symbols can be used to signify the hardware and software environment eliminating the need to include them explicitly with -D switches.

When used as a cross compiler, the gcc compiler defines directives that are appropriate for the target system. In Listing Two(d), the directives for a SHARC DSP target, the processor-related symbols are ADSP_21000, __21K__, and so on. This refers to the Analog Devices part number for the SHARC processor.

Even if you carefully manage the conditional compilation directives, as you add more targets and more target-specific sections, the code becomes less readable and therefore less maintainable. For example, in Listing Two(e), the programmers have, over time, added special cases to the flow of the code with #ifdef directives. Especially confusing are the negative logic conditions; that is, the #ifndef directive or #if !defined directive. It's hard to say for sure that the enclosed code is correct for all circumstances except the one specified. It's much clearer and therefore more maintainable to use only positive logic conditionals. Even with the best of intentions, replicating a block of conditionally compiled code for each call to dtime() is unacceptable. Let's look at some more maintainable strategies.

Defining an API Using Preprocessor Macros

When porting code to support operating-system functions or vendor libraries that are frequently used in the basis of the software, using preprocessor macros to define an API that encompasses the variants of the libraries is one technique in achieving greater portability. A common illustration of such a case in general-purpose DSP applications is the use of third-party vector libraries. Because such libraries are often hand coded in assembly language and highly optimized, it is to your benefit to use these libraries rather than a C-coded substitute. Unfortunately, the same libraries are engineered for a particular chip, such as the ADI SHARC 21020 or Intel's i860, and cannot be ported to different architectures. Furthermore, the API for similar functionality can vary for even the simplest vector computations.

For example, take the case of a vector math operation that multiplies each element of a floating-point array with a single floating-point scalar. This is referred to as a "vector-scalar multiply." In straight C, this operation might look like Listing Three(a). The integers j and k let you select different input and output strides; this is a common feature of vector math routines. Listing Three(b) shows the API of the same operation in various libraries used at some point in the development of our real-time signal-processing software.

In some implementations, the scalar is passed by value, while in others, it is passed by reference. Also, some APIs do not support input and output striding. One library will only do the computation in place. For signal-processing software that typically requires a sequence of vector math library calls, you can imagine the unreadable complexity of a software module written to support multiple targets; see Listing Three(c).

It will not take long for the source code to become overwhelming. For each new architecture supported, another code block has to be added to the processing module. Furthermore, every new processing module developed would require a number of #ifdef code sections if it is to be used across multiple architectures supported by the system. Add to this the #include statement madness needed to properly prototype the functions and things become impossible to maintain, let alone port. Fortunately, there is a better way.

The first thing to do is adopt or define an API that encompasses the essential functionality. In our real-time signal-processing system, many of the signal-processing modules had been first written and tested to a particular vendor library; see Listing Three(d). Some vendors use the same name for functions with different signatures. Adding a personalizing prefix or suffix, as in Listing Three(e), can circumvent this problem. We defined a new header, myvec .h, designed to isolate all the porting mechanisms from the other C modules. Used in the implementation, the module looks a lot cleaner; see Listing Three(f). Conversely, inside the myvec.h header file we have all the "hair," albeit well organized, as in Listing Three(g).

At compile time, the C preprocessor substitutes the vsmul vector call for the vsmul_sys symbol in the case of the SKY target. In the case of the WIN32 target, the vsmul_sys symbol comes to represent a set of optimized function calls arranged to duplicate the operation of the adopted API. Since the NSP library does not support input or output striding, an assertion has been injected into the call sequence to enforce the constraints of the library. Most users of this function do not use the stride feature, but we could choose to provide our own stride-capable version if necessary.

This approach is beneficial in a number of ways. First, the module that uses the API does not have to be modified if simply porting to another platform. For modules that have undergone acceptance testing and have a track record of reliability, this is an important benefit because it minimizes the potential introduction of bugs into the code. Validation of the substitution macro is the only thing that needs to undergo regression testing, and this can be done in a separate test suite maintained to prove adherence to the adopted/defined API.

Second, adding functionality to the module is a matter of adding a single set of function calls, all drawn from the defined API. By writing the target-independent code to the defined API, all other architectures are automatically supported, assuming the function calls for each target have a representative macro in the header (myvec.h, for instance).

Finally, supporting future targets is simply a matter of adding a new #ifdef block to the header file. Although this requires a bit of work, the overall ease of porting and readability of the source code is well worth the effort.

There are a few cautionary notes in the use of macro definitions to implement a target-independent API. In the case of multiple statements, see Listing Four(a), use brackets around the statements to prevent certain compile or, even worse, run-time errors. This can be best seen in Listing Four(b), where vsmul_sys appears after an unbracketed if statement. If the multistatement macro does not use brackets, the C preprocessor makes an erroneous substitution, as in Listing Four(c). Also, notice that for the WIN32 target, the macro function for vsmul_sys references the variable n twice, once in the nspsbCopy call and again in nspsbMpy1. This can cause some unexpected behavior when using the increment (++) or decrement (- -) operators in the target-independent code; see Listing Four(d). Experienced C programmers know to avoid the use of the increment and decrement operators in preprocessor macros to avoid just these problems.

Also, the misuse of semicolons in the macro statement can sometimes cause warnings during compilation. In Listing Four(e), the end semicolon is missing from the last statement. This ensures that in the case of a preprocessor substitution, there will not be two semicolons in the preprocessor output.

Separate Linkage

So far we've looked at the preprocessor as the primary tool for achieving code portability, but it's not the only tool available. The linker can also be used to generate targeted executables. The first thing to do is define the interface for the target-specific function that can be called from target-independent code; in this case, the function is double dtime();. Put this prototype into a header file to be included in the target-independent code; we'll call it dtime.h. Then, for each target, make a .C file that includes dtime.h along with any target-specific header files. In this file, write the target-specific implementation of the function. Listing Five is an example for the SHARC DSP sharc_dtime.c.

There will also be other .C files for the other targets. It is up to the build tool to only compile and link in the appropriate .C file for the selected target. This method has the advantage of keeping the code readable. There are no conditionals to obscure the flow of the code in either the target-independent code or the target-specific code. Also the rigorous definition of the target-specific function allows the same test cases to be used for all targets.

Resolving Target Dependencies During Execution

Sometimes you may not know the exact hardware configuration until the program begins execution. In such a case, you need to include all of the needed code in the executable image and switch among them depending on the hardware environment that is detected.

Say, for example, that our dtime() function had to operate on two similar but different SHARC DSP boards: One has a high-resolution clock chip and the other does not. We'd like dtime() to use the high-resolution clock if it's available; otherwise, we will just use the internal clock on the SHARC. One way to do this is by probing for the high-resolution clock and calling dtime(), as in Listing Six(a). This example takes four lines of code to call dtime(). These four lines have to appear every place that dtime() is used. If someone forgets, then the wrong clock will be used, potentially causing large errors.

A better solution is to move the four lines into another function that could be called from target-independent code. This hides the ugliness, but is still not ideal. We really have two separate functions, but they come together in the new function. As more hardware configurations are supported, more global variables or probe functions are required to hold the configuration state.

Better still is a function pointer that can be initialized at startup to point to the right function. No target-independent code is affected, and the initialization routine is the only place where decisions have to be made about the hardware configuration. Listing Six(b) shows how it might work. Now the writer of the target-independent code just needs to know about the dtime() function and can use it without even knowing of hardware dependencies. The target-independent code can use dtime() as if it were any other function. The target-specific implementations of the dtime() function (high_res_ time() and sharc_dtime()) can be developed and tested independently.

Run-time function pointers in C are conceptually similar to virtual methods in C++. In Listing Six(b), dtime() is abstract, and high_res_time() and sharc_dtime() are its concrete implementations. Users of the dtime() function neither know nor care how it is implemented, and can use dtime() as if it were any other function: millis=dtime();.

Conclusion

To achieve C code portability for real-time and embedded DSP systems you must separate the target-specific components from those that are target independent. The choice of techniques to use depends on the characteristics of the software system to be ported. For small systems with small blocks of target-specific code, the preprocessor methods described here are appropriate. Using the preprocessor to select the target-specific code during compilation can also be computationally efficient by possibly eliminating a function call.

For larger systems with more target-specific code, using separate target-specific compilation units pulled together with the linker is advantageous. The target-specific compilation units are easy to read and understand because conditional compilation directives are not required. This approach also lends itself to regression testing because the target-specific functions have the same name, making the regression test driver automatically portable.

Resolving the target dependencies at run time is the only option available when some characteristics of the target system are not known at compile and link time.

Writing a C-coded application to concurrently support multiple processors and operating systems can be an arduous task. Adopting the techniques we outline here takes an investment of time, but the gain from the effort will be realized in a reduced time-to-market and increased quality when new target architectures are added.

DDJ

Listing One

(a)

double sharc_dtime();  /* version of dtime() for the SHARC DSP */ 

(b)
<pre>#ifdef SHARC
 millis = sharc_dtime();
#else
 millis = dtime();
#endif

(c)
<pre>gcc -DSHARC main.c

(d)
<pre>#ifdef SHARC
  millis = sharc_dtime();
#else if defined(VxWorks) && defined(X86)
  millis = read_386_counter();
#else if defined(VxWorks) && defined(MC68000)
  millis = read_moto_counter();
#else if defined(WIN_32)
  millis = read_windows_timer();
#endif

Back to Article

Listing Two

(a)

#define VxWorks /* Target is VxWorks OS, any processor      */
#define X86     /* Processor is Intel 386 or higher, any OS */
#define unix    /* Any UNIX, any processor                  */
#define linux   /* linux UNIX, any processor                */

(b)
<pre>#if defined(VxWorks) && defined(X86)
/* Pentium and VxWorks-specific code */
#endif

(c)
<pre>% gcc -v main.c
gcc version 2.8.1
 /usr/local/lib/gcc-lib/sparc-sun-solaris2.6/2.8.1/cpp -lang-c -v 
      -undef -D__GNUC__=2 -D__GNUC_MINOR__=8 -Dsparc -Dsun -Dunix 
      -D__svr4__ -D__SVR4 -D__sparc__ -D__sun__ -D__unix__ -D__svr4__ 
      -D__SVR4 -D__sparc -D__sun -D__unix -D__GCC_NEW_VARARGS__ main.c

<pre>(d)
C:\> g21k -v main.c
gcc version rel3.3 21k/SHARC 3.3:
c:\adi_dsp\21k\etc\cpp.exe -lang-c -v -undef -D__GNUC__=2 -D__ADSP21000__ 
    -DADSP21000 -D__21K__ -D__ADSP21020__ 
    -DADSP21020 -D__DOUBLES_ARE_FLOATS__ main.c

(e)
<pre>#ifdef NEW
  millis = new_dtime();
#if !defined(BOB) || defined(SHARC)
  millis += bobs_counter();
#else if defined(VxWorks) && !defined(WIN_32)
  millis = who_knows_what();
#endif /* what! */
#if defined(WIN_32)
  millis = read_windows_timer();
#endif
#endif

Back to Article

Listing Three

(a)

void vsmul(float *invec, int j, float scalar, 
float *outvec, int k, int n);
{
for(i=0;i<n;i++)
        outvec[i*k] = scalar*invec[i*j];
}

(b)
<pre>/* SKY Standard Math Library */
vsmul(float *invec, int j, float scalar, 
float *outvec, int k, int n);

/* SKYvec library does not support strides and uses a global */
/* variable for the vector length */
_skyvec = n;
v$_rsvt0(float scalar, float *invec, float *outvec);

/* Wideband Computers Inc (SHARC) */
vsmul(float *invec, int j, float scalar, 
float *outvec, int k, int n);

/* Alacron i860 */
vsmul(float *invec, int j, float scalar, 
float *outvec, int k, int n);

/* Alacron SHARC(passes scalar by reference)  */
vsmul(float *invec, int j, float *scalar, 
float *outvec, int k, int n);

/* Intel Native Signal Processing (NSP)for Pentium   */
/* Operation done in place. Does not support stride  */
nspsbMpy1(float scalar, float *outvec, int n);}

/* Mercury Computers(passes scalar by reference) */
vsmul(float *invec, int j, float *scalar, 
float *outvec, int k, int n);

(c)
<pre>#if defined(SKY)

#include <mathlib.h>
#endif /* SKY */

#if defined(WIN32)
#define nsp_UsesAll
#include <nsp.h>
#endif /* WIN32 */
  /* etc */
my_processing_module()
{
  /* code */
#ifdef SKY
    /* other vector calls */
    scalar = 2.0;
    vsmul(invec, 1, scalar, outvec, 1, n);
    /* even more vector calls */
#endif /* SKY */

#if defined(WIN32)
    /* previous vector calls */
    scalar = 2.0;
    nspsbCopy(invec,outvec,n); 
    nspsbMpy1(scalar,outvec,n);
    /* subsequent vector calls */
#endif /* WIN32 */
    /* other architectures, etc. */
}

(d)
<pre>vsmul(float *invec, int j, float scalar, float *outvec, int k, int n);

(e)
<pre>vsmul_sys(float *invec, int j, float scalar, float *outvec, int k, int n);

(f)
<pre>#include<myvec.h>
my_processing_module()
{
    vsmul_sys(invec, j, scalar, outvec, k, n);
}

(g)
<pre>#if defined(SKY)
#include <veclib.h>
#define vsmul_sys vsmul
#endif /* SKY */

#if defined(WIN32)
#define nsp_UsesAll
#include <nsp.h>
#define vassrt(i,j,k) assert(i==1 && j==1 && k==1)
#define vsmul_sys(a,i,b,c,k,n) \
{vassrt(i,1,k); nspsbCopy(a,c,n); nspsbMpy1(b,c,n);}
#endif /* WIN32 */

Back to Article

Listing Four

(a)

#define vsmul_sys(a,i,b,c,k,n) \
{vassrt(i,1,k); nspsbCopy(a,c,n); nspsbMpy1(b,c,n);}

(b)
<pre>#include <myvec.h>
my_processing_module()
{
    if(scalar != 0.0)
        vsmul_sys(invec, j, scalar, outvec, k, n);
}

(c)
<pre>#include <myvec.h>
my_processing_module()
{
    if(scalar != 0.0)
        vassrt(i,1,k); 
    nspsbCopy(invec,outvec,n); /* wrong */
    nspsbMpy1(scalar,outvec,n); /* wrong */
}
(d)
<pre>n=10;
vsmul_sys(invec, j, scalar, outvec, k, n++);
/* Now n equals 12 for the WIN32 target!!!!  */

(e)
<pre>#define vsmul_sys(a,i,b,c,k,n)  vecscalmul(b,a,i,c,k,n)

Back to Article

Listing Five

#include "dtime.h"
#include <21020.h>
double dtime() /* no longer called "sharc_dtime()" */ 
{
  unsigned long stimer;
  asm volatile("%0=TCOUNT;": "=d"(stimer));
  asm volatile("BIT CLR MODE2 32;");
  asm volatile("TPERIOD=0xFFFFFFFF;TCOUNT=0xFFFFFFFF;" );
  asm volatile("BIT SET MODE2 32;" );
  return (0xFFFFFFFF - stimer) / 40000.0;
}

Back to Article

Listing Six

(a)

if (probe_for_clock())
  millis = high_res_time();
else
  millis = sharc_dtime();

(b)
<pre>#include "dtime.h"
double (*dtime)() = 0;
void initialize() 
{
  extern double high_res_time();
  extern double sharc_dtime();
  if (probe_for_clock())
    dtime = high_res_time;
  else
    dtime = sharc_dtime;
}

Back to Article

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Portability In C

James Metzger and William Wright

The C Preprocessor

Defining an API Using Preprocessor Macros

Separate Linkage

Resolving Target Dependencies During Execution

Conclusion

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Listing Six

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Database

Portability In C

James Metzger and William Wright

The C Preprocessor

Defining an API Using Preprocessor Macros

Separate Linkage

Resolving Target Dependencies During Execution

Conclusion

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content