Multiplexing Error Codes

FEB90: MULTIPLEXING ERROR CODES

William is a senior programmer for Digital Products Inc. and can be reached at 108 Water Street, Watertown, MA 02172

Diagnosing unexpected errors can be one of the most frustrating and troublesome aspects of software development. Even the most well-designed software can have minor flaws that have catastrophic consequences. The actual symptoms often give no clue as to the nature of the defect. Worse, they may even be difficult to reproduce, particularly if the bug is reported from the field.

A programmer can often spend hours, if not days, iterating through many cycles of program modification and testing to track down even a small defect. Fortunately there are many tools and techniques to help. Interactive debuggers, code interpreters, built-in debug code, and robust error handling within the program itself are all useful, but each has its limitations.

All but the last of the techniques just mentioned require that the program be run again to duplicate the error in question. But this is not always convenient or possible. In such cases, how the program deals with an unexpected error the first time it occurs becomes extremely important. If the error is reported in such a way that the programmer can close in on it with that information alone, much time can be saved. The major problem with unexpected errors is, of course, that they are unexpected and therefore impossible to handle specifically. They require a systematic approach.

For systematic error handling to be effective, it has to be used widely and consistently. As a practical matter, this means that error handling cannot require much, if any, extra work from the programmer.

This leaves us with two apparently conflicting goals: Providing enough information to easily diagnose unexpected errors wherever they occur, while adding little or no extra work to the original programming task. The following is a description of a scheme I have used to do just that.

Overview

The error handling system presented here hinges on function communication. Functions that use this scheme will return an error code. A return value of zero is used to indicate success, while a non-zero return indicates some sort of failure. Exception handling logic can then be processed whenever such a function returns a non-zero value. In most cases the exception processing amounts to returning an indication of failure to the calling function. Because most low-level functions do not know the context in which they were called, they cannot deal with the error directly.

The failure returns back through several levels of functions until it is finally dealt with in some way. If it is an unanticipated error, the program will probably abort with some sort of error message. To be able to trace the root cause of the error, we need to be able to identify its source and preserve the logic path to it.

To do that this scheme associates each possible error condition within a function with a numeric code. At the lowest level that numeric value is simply returned. At each subsequent return the return value is combined with another numeric code. This uniquely identifies each return, preserving the path to the original failure. Because the path is preserved, the numeric code needs to be unique only within each function.

Consider the following example: The original error causes the return of an error code. This code uniquely identifies the location of the error within that function. After testing the return value of the function, the calling function also generates an error code that uniquely identifies the location of that call within that function.

The two codes are combined and returned to the next level. This process is repeated at each level until the error is either handled or the program is aborted. The code fragment in Example 1 shows how this might work. Note that the function mid_level can return both an individual error code (if parm is NULL) or a combined error code (if low_level returns an error).

Example 1: Combining codes and returning to the next level

  unsigned mid_level (char *parm)
  {

       unsigned err, low_level( );
       if (parm == NULL)
            return (1);

       ...
       err = low_level (i, j);
       if (err)
            return (ERR_COMBINE (err, 3));
       ...
       return (0);
  }

  unsigned low_level (int x, int y)
  {

       if (x > 0)
            return (1);
       if (x > y)
            return (2);
       ...
       return (0)
  }

The fact that the individual error numbers are hard coded might seem to violate good programming practice, but because they must be unique within each function, they should be hard coded.

Combining Codes

The actual combining of error codes is done with the macro ERR_COMBINE, which is the key to this scheme. Combining error codes must be done in such a way that they can be later separated and decoded.

Consider the simple scheme where the ERR_COMBINE macro is defined as in Example 2. Multiplying the original code by ERR_BUMPER before adding the new number, shifts the original left so that its value is not lost when the new code is added.

Example 2: Defining a simple ERR_COMBINE macro

  #define ERR_BUMPER 10
  #define ERR_COMBINE (orig, to_add)    ((orig *
                           ERR BUMPER) + to_add)

A value of 10 for ERR_BUMPER is convenient because the error number can be visually decoded, but when doing so it needs to be interpreted from right to left. Each digit in the decimal integer display represents the error number at each function level. The right-most digit represents the highest-function level.

Decoding Error Numbers

Visual decoding is not necessary. The function in Example 3 will decode the combined error code for any value of ERR_BUMPER and display the individual codes so that they can be interpreted from left to right.

Example 3: This function will decode the combined error code and display the individual codes

  void err_print (FILE *stream, unsigned err_code)
  {

       do
       {
            fprintf (stream, "%d", (err_code %
                                 ERR_BUMPER));
       }
       while ((err_code /= ERR_BUMPER) > 0);
  }

Consider the output ERROR: 34152. While this error message is far too cryptic to understand by itself, a programmer with access to the source code can pinpoint the root cause of the error quickly. The process is simple, starting with the main function, locate the function failure that produces code 3. See Example 4. Next, move to that function and locate the function failure in that function that produces code 4. Repeat this process at each function level until the last one is reached.

Example 4: Code produced once function failure has been located

  main( )
  {
        unsigned err, function( );
        ...
        err = function ( );
        if (err)
             abort (ERR_COMBINE (err, 3));
        ...
  }
  void abort (unsigned err_code);
  {
        fprintf (stderr, "\n ERROR:");
        err_printf (stderr, err_code);
        exit (err_code);
  }

It is a good idea to check the syntax of each function call at every step. The defect is not always with the lowest-level function. I have found that when this process is complete, the bug is often obvious.

Some Improvements

The ERR_COMBINE macro in the previous example makes two important assumptions. First, the individual error codes are always less than the value of ERR_BUMPER. Second, the combining of the error codes does not overflow the data type used for the error return (unsigned int).

Because the individual error numbers are hard coded, the first assumption is fairly easy to control. You will find that most functions will need only a few error numbers. If your function requires much more than a half dozen, it is probably too large and should be split up into two or more smaller routines anyway. In the rare case when extra codes are needed, two codes can be combined as follows:

   error = ERR_COMBINE (ERR_COMBINE (error, 9), 1);

The second assumption, that the error code will not overflow, is much more dangerous. In a system of any size functions will be nested at many levels. An unsigned (2 byte) int and an error bumper of 10 allows for only four or five levels of nesting before the error code will overflow. We can increase the maximum levels by using a long rather than an integer, as well as by reducing the value of the error bumper. However, an int is preferable to long, from a coding efficiency standpoint, because an int will usually be the "natural" size for the CPU (K & R p. 34). Efficiency is an important consideration because this error code will be returned and tested in many places.

We cannot guarantee that the error code will never overflow. But as Example 5 shows, a function rather than a macro lets you develop a more sophisticated scheme to save error codes that would be lost in an overflow. We can then provide support for very large systems while making no special requirements on the capacity of the error code or the size of the error bumper.

Example 5: Using a function instead of a macro

  unsigned err_combine (unsigned original, unsigned to_add)
  {
       if (original > UINT_MAX / ERR_BUMPER)
       {    /* UINT MAX is in limits.h */
            err_push (original);
            original = 0;
       }
       return (original * ERR_BUMPER + to_add);
  }
  #define MAX_OVERFLOWS     10
  static unsigned err_stack [MAX_OVERFLOWS];
  static unsigned err_stack_top = 0;

  unsigned err_pop()
  {
       if (err_stack_top <= 0)
            return (0);

       --err_stack_top;

       return (err_stack [err_stack_top]);
  }

  void err_push (unsigned err_code)
  {
      if (err_stack_top < MAX_OVERFLOWS)
      {
            err_stack [err_stack_top] = err;
            ++err_stack_top;
      }
  }

The err_combine function tests for potential overflow. When required, the original multiplexed code will be saved in another location and then reset, at which point normal processing will continue.

The saved code is stored in a stack implemented as an array that can be made as large as required by the program size. An additional function (err_pop) is required to pop any overflow portions of the multiplexed error code off the stack. The previous err_print function can be changed to display the entire error code, as shown in Example 6.

Example 6: Changing ERR_PRINT to display the entire error code

  void err_print (FILE *stream, unsigned in err_code)
  {
        while (err_code)
        {
            do
            {
                 fprintf(stream, "%d", (err % ERR_BUMPER));
            }
            while ((err_code /= ERR_BUMPER) > 0);
            err_code = err_pop();
        }
  }

The source code in Listing One, page 108 combines all the aforementioned concepts into a single module of utility functions, ready for use in any new programming project.

The err_combine function in that module has an additional feature worth some note. It tests the original error code value, and if it is a special "pass through" value, the error codes are not combined. Only the original pass through value is returned. In this way exceptions that are expected (such as user abort) can be easily handled using the same exception processing logic. The rather cryptic abort function used in a previous example can now be expanded as shown in Example 7.

Example 7: Expanding the abort function

  void abort (err)
  {
       switch (err)
       {
       case DISK_SPACE_ERROR:
            printf("\n Not enough disk space to run program");
            break;
       case MEMORY ERROR:
            printf("\n Not enough memory to run program.");
            break;
       case USER ABORT:
            printf(stderr, "\n Program aborted by user.");
            break;
       default:
            printf("\n Unexpected error: %d ", err);
            printf("\n Please record this error number,");
            printf("\n and call technical support at");
            printf("\n 1-800-555-1234.");
       break;
       }
       exit (err);
  }

Where the case values of this switch statement are some predefined pass through values, the rule is: Any value that is an even multiple of ERR_BUMPER is not modified, the original value will be passed through. This rule works well because of the way that the combine algorithm works. Multiples of ERR_BUMPER will not be generated as error codes because individual error codes are non-zero by definition.

Summary

As mentioned earlier, the power of this error handling scheme lies in the detection of unexpected errors through its systematic use. It can detect errors only where it is used. To encourage its wide use, it has been designed to minimize the work required to implement it. Any time an error condition is returned by a function, it can simply be kicked upstairs without concern for losing calling context information.

Because a unique identifier is added each step of the way, valuable trace information can be provided. This information can often be critical in discovering the nature of a program defect. And by providing the information with the first occurrence of the bug, we can significantly reduce the diagnosis time, particularly for bugs with symptoms that are difficult to reproduce.

Because the trace information is preserved, the individual error codes need to be unique to the function only. This is very convenient in large systems where managing many error numbers would become quite cumbersome.

You may have guessed that this error identification scheme is most helpful in the early stages of development, and it is. But it is also helpful late in a program's life cycle. It's especially good for catching those "once in a blue moon" bugs. Usually all you need is the error code and the correct version of the source code.

You will find that this system will not always lead you to the program defect by itself, and you may have to resort to other debugging techniques, but it will provide you with a valuable head start.

Notes

The C Programming Language, by Brian Kernighan and Dennis Ritchie. Prentice Hall, Englewood Cliffs, New Jersey, 1978.

_MULTIPLEXING ERROR CODES_ by William J. McMahon

[LISTING ONE]

<a name="006b_0012">

/* ----------------------------------------------------------------------
    ERR_CODE.C Written by: William J. McMahon
    This module contains the functions used to manipulate error codes.
    Global Functions Defined Herein:
   err_combine(), err_format(), err_print()
----------------------------------------------------------------------- */
#include <stdio.h>
#include <limits.h>

#define ERR_BUMPER      10
#define ERR_THRESHOLD   (UINT_MAX/ERR_BUMPER)      /* ... for overflow. */

/* ----- Local Functions Defined Herein: ----- */
unsigned err_pop();
void     err_push();

#ifdef TEST             /* -------------- Test Harness ---------------- */

#define FIRST_ARG   0   /* Varies with compiler (0 or 1).               */
#define NCODES      32

main(argc, argv)
    int argc;
    char *argv[];
{
    unsigned err_combine();
    void     err_format();
    void     err_print();

    unsigned err_code;
    int      adder = 1;
    int      i;

    if (argc > FIRST_ARG)
   /* Override default starting code. */
   adder = atoi(argv[FIRST_ARG]);

    err_code = adder;
    printf("\nInput should be a mirror image of output.\n");
    printf("\n Input sequence: %d", err_code);
    for (i = 0; i < NCODES; ++i)        /* Build an error code, using   */
    {                                   /* multiple err_combine() calls.*/
   ++adder;

   if (adder >= ERR_BUMPER)
       adder = 1;
   printf("%d", adder);            /* Output RAW codes.            */
   err_code = err_combine(err_code, adder);
    }

    printf("\nOutput sequence: ");
    err_print(stdout, err_code);
}
#endif
/* ----------------------------------------------------------------------
    ERR_COMBINE Combines an new individual error code with an existing one.
    Returns: Combined error code.
----------------------------------------------------------------------- */
unsigned err_combine(
    unsigned original,      /* Original error code.                     */
    unsigned to_add)        /* Code to be added to it.                  */
{
    if ((original % ERR_BUMPER) == 0)   /* Some special codes are not   */
   return (original);              /* changed.                     */

    to_add %= ERR_BUMPER;               /* Make sure its in range.      */

    if (original > ERR_THRESHOLD)
    {   /* Prevent overflow. */
   err_push(original);
   original = 0;
    }

    return (original * ERR_BUMPER + to_add);
}

/* ----------------------------------------------------------------------
   ERR_FORMAT Decode and format an error code (and any overflow)
   into a string. Returns: Nothing.
----------------------------------------------------------------------- */
void err_format(
    char     *buffer,       /* Buffer to put formated code into.        */
    unsigned  err_code)     /* Error code to format.                    */
{
    char *p;
    p = buffer;
    while (err_code)
    {
   do
   {
       sprintf(buffer, "%d", err_code % ERR_BUMPER);
       buffer += strlen(buffer);
   }
   while ((err_code /= ERR_BUMPER) > 0);
   err_code = err_pop();
    }
}
/* ----------------------------------------------------------------------
   ERR_PRINT  Decode and output an error code (and any overflow).
    Returns: Nothing.
----------------------------------------------------------------------- */
void err_print(
    FILE     *stream,       /* Streem to output formated code to.       */
    unsigned  err_code)     /* Error code to output.                    */
{
    while (err_code)
    {
   do
   {
       fprintf(stream, "%d", err_code % ERR_BUMPER);
   }
   while ((err_code /= ERR_BUMPER) > 0);
   err_code = err_pop();
    }
}

/* ================= Local stack for overflow codes. ================== */
#define MAX_OVERFLOWS 10

static unsigned err_stack[MAX_OVERFLOWS];
static unsigned err_stack_top = 0;

/* ----------------------------------------------------------------------
   ERR_POP  Returns: Combined error code of most recent overflow, 0 if none.
----------------------------------------------------------------------- */
static unsigned err_pop()
{
    if (err_stack_top <= 0)
   return (0);

    --err_stack_top;
    return (err_stack[err_stack_top]);
}

/* ----------------------------------------------------------------------
   ERR_PUSH  Push error code onto stack.
    Returns: Nothing.
----------------------------------------------------------------------- */
static void err_push(
    unsigned err_code)       /* Error code to save.                     */
{
    if (err_stack_top < MAX_OVERFLOWS)
    {
   err_stack[err_stack_top] = err_code;
   ++err_stack_top;
    }
}



Example 1: Combining codes and returning to the next level

               unsigned mid_level(char *parm)
               {
                    unsigned err, low_level();
                    if (parm == NULL)
                         return (1);
                    ...
                    err = low_level(i, j);
                    if (err)
                         return (ERR_COMBINE(err, 3));
                    ...
                    return (0);
               }

               unsigned low_level(int x, int y)
               {
                    if (x > 0)
                         return (1);
                    if (x > y)
                         return (2);
                    ...
                    return (0)
               }


Example 2: Defining a simple ERR_COMBINE macro


     #define ERR_BUMPER  10
     #define ERR_COMBINE(orig, to_add)   ((orig * ERR_BUMPER) + to_add)


Example 3: This function will decode the combined error code and
display the individual codes

          void err_print(FILE *stream, unsigned err_code)
          {
               do
               {
                    fprintf(stream, "%d", (err_code % ERR_BUMPER));
               }
               while ((err_code /= ERR_BUMPER) > 0);
          }

Example 4: Code produced once function failure has been located

          main()
          {
               unsigned err, function();
               ...
               err = function();
               if (err)
                    abort(ERR_COMBINE(err, 3));
               ...
          }
          void abort(unsigned err_code);
          {
               fprintf(stderr, "\n ERROR:");
               err_printf(stderr, err_code);
               exit (err_code);
          }


Example 5: Using a function instead of a macro

               unsigned err_combine(unsigned original, unsigned to_add)
               {
                    if (original > UINT_MAX / ERR_BUMPER)
                    {    /* UINT_MAX is in limits.h */
                         err_push(original);
                         original = 0;
                    }
                    return (original * ERR_BUMPER + to_add);
               }

               #define MAX_OVERFLOWS    10
               static unsigned err_stack[MAX_OVERFLOWS];
               static unsigned err_stack_top = 0;

               unsigned err_pop()
               {
                    if (err_stack_top <= 0)
                         return (0);

                    --err_stack_top;

                    return (err_stack[err_stack_top]);
               }

               void err_push(unsigned err_code)
               {
                   if (err_stack_top < MAX_OVERFLOWS)
                   {
                         err_stack[err_stack_top] = err;
                         ++err_stack_top;
                    }
               }




Example 6: Changing ERR_PRINT to display the entire error code

          void err_print(FILE *stream, unsigned in err_code)
          {
                while (err_code)
                {
                    do
                    {
                         fprintf(stream, "%d", (err % ERR_BUMPER));
                    }
                    while ((err_code /= ERR_BUMPER) > 0);
                    err_code = err_pop();
               }
          }



Example 7: Expanding the abort function

          void abort(err)
          {
               switch (err)
               {
               case DISK_SPACE_ERROR:
                    printf("\n Not enough disk space to run program");
                    break;
               case MEMORY_ERROR:
                    printf("\n Not enough memory to run program.");
                    break;
               case USER_ABORT:
                    printf(stderr, "\n Program aborted by user.");
                    break;
               default:
                    printf("\n Unexpected error: %d ", err);
                    printf("\n Please record this error number,");
                    printf("\n and call technical support at");
                    printf("\n 1-800-555-1234.");
               break;
               }
               exit(err);
          }

Embedded Systems

Multiplexing Error Codes

Overview

Example 1: Combining codes and returning to the next level

Combining Codes

Example 2: Defining a simple ERR_COMBINE macro

Decoding Error Numbers

Example 3: This function will decode the combined error code and display the individual codes

Example 4: Code produced once function failure has been located

Some Improvements

Example 5: Using a function instead of a macro

Example 6: Changing ERR_PRINT to display the entire error code

Example 7: Expanding the abort function

Summary

Notes

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Embedded Systems

Multiplexing Error Codes

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content