Debugging C Programs

To make the job of program testing and debugging less frustrating, Bob shows how you can extend the "assert()" macro so that it becomes a practical debugging tool.


January 01, 1989
URL:http://www.drdobbs.com/cpp/debugging-c-programs/184408278

Figure 1


Copyright © 1989, Dr. Dobb's Journal

SP 89: DEBUGGING C PROGRAMS

DEBUGGING C PROGRAMS

Don't forget assert( ) and the stack

Bob Edgar

Bob is an experienced C programmer from Britain and can be reached through the DDJ office.


Testing and debugging are probably the most time-consuming and neglected aspects of the programmer's trade, and we programmers need all the help we can get to figure out what our programs are really doing, as opposed to what we think they should be doing.

The well-known assert( ) macro can be extended in a number of ways as a useful debugging tool. The basic idea is to have a statement that checks for an illegal condition in the program; the statement can be completely "deactivated" by defining the NDEBUG macro, presumably in the compiler command line. A simple version is in Example 1. Or, if a function do_str( ) takes a single string argument, which must never be NULL, you might use the assertion shown in Example 2. (Notice that a semicolon is not required: The assert macro defines a complete statement.) The definition in Example 1 assumes that the C macro processor will substitute cond in the string argument to printf, with the condition given as an argument to assert; different implementations of the macro processor differ on their rules for argument substitution inside strings, and you should check the documentation for your compiler. ANSI C has defined new operators (#, ##) for strings in macros, which would be the preferred solution. The first improvement is to make an arbitrary printf( ) call if a given condition was (or was not) satisfied. The problem is that the macro processor does not allow macros with a variable number of arguments. The solution is to use parentheses, so that the entire argument list to printf is a single argument to the macro, as shown in Example 3. (In this and the following macro definitions, it is assumed that the macro is defined to be empty when NDEBUG is turned on.) A typical use of this technique is shown in Example 4.

Example 1: Defining the NDEBUG macro

  #ifndef NDEBUG
  #define assert(cond) {if (!(cond)) \
      printf("ASSERT cond, file %s line %d\n", \
      __FILE__, __LINE__); exit(1);}
  #else
  #define assert(cond) {;}  /* Empty block */
  #endif

Example 2: Using an assertion

  do_str(s)
  char *s;
      {
      assert(s != NULL)
      /* rest of do_str... */

Example 3: Using parentheses so that the entire argument list to printf is a single argument

  #define assertp(cond, args)  {if (!(cond) \
      printf("ASSERT cond, file %s, line %d\n", \
      __FILE__, __LINE__); \
      printf args; exit(1);}

Example 4: A typical use of assert

  assertp(n > 3, ("Surprise! n > 3, is %d\n", n))

In a major, complicated program it is often an advantage when analyzing problems to have statements that print out critical variables and data structures when required, but are normally suppressed so that the output from the program is not cluttered with a lot of irrelevant information. You can extend the idea of the assert macro to provide a convenient mechanism for making such output.

I introduce the idea of a "trace level," which determines how much debugging output is produced. Permissible values might range from 0, meaning that no debugging output is printed, to 9, the maximum amount of output. A macro that unconditionally prints when the trace level is high enough is shown in Example 5.

Example 5: A macro that unconditionally prints when the trace level is high enough

  #define assert1(level, args)  {if (tlevel >= (level)) \
      printf("file %s, line %d\n", \
      __FILE__, __LINE__); \
      printf args; exit(1);}

An external integer tlevel should be visible and should contain the trace level. The macro in Example 6 prints the values of i and j when the value of tlevel is two or more. A convenient way of setting tlevel is to provide a command line argument to the executable program, such as Ln where n is a digit from 0 to 9.

Example 6: A typical use of Example 5

  assert1(2, ("i=%d j=%d\n", i, j))

To avoid having to check for this command-line argument in all programs using these macros, write a standard version of main( ) that checks for the argument, performs any other initializations required by your libraries, and calls the user's main, which has (transparently to the user) been renamed. Your version of the assert.h header file could include code similar to that shown in Example 7.

Example 7: An assert.h header file

  #ifndef NDEBUG
  extern short tlevel;
  #define main    mymain
  #endif

The standard version of main( ) would be in a library file, which would then only be included at link time if the user's version of main( ) had been compiled without NDEBUG. An elegant solution would remove the -L command-line option from the argument list so that the user program need not check for it. An example of this can be seen in Listing One, page 93.

A trace level alone is probably too crude for controlling debugging output from a large program; to improve it, you could identify major program modules and data structures that might be traced, and provide, in addition to tlevel, an integer considered as a bit vector, tbits. A module or data structure in the program is assigned a bit in tbits, if the bit is set, output related to that module or data is printed. Imagine, for example, a compiler. You might define the bits as shown in Example 8.

Example 8: Typical bit definitions

  #define TBIT_LEXER   0x0001 /* Lexical analysis functions */
  #define TBIT_PARSER  0x0002 /* Parser */
  #define TBIT_EXPTREE 0x0004 /* Expression tree */
  #define TBIT_CODEGEN 0x0008 /* Code generator */
  #define TBIT_OPTIM   0x0010 /* Optimizer */

You can now define assert-like macros with a new argument to specify the tbits bits, which must be set for the information to be printed. This will be a bit-wise OR of one or more of the bits that the user assigns to parts of the program, as shown in Example 9. A general macro, say assertbl, would include both bits and level arguments. The -L command line argument could be extended to the form -Lnxxxx, where n is the trace level and xxxx is four hex digits specifying the bits to set in tbits.

Example 9: Defining assert-like macros

               #define assertb(bits, args)    {if ((bits) & tbits) \
                       printf args;}

  which can be used in this way:

          assertb(TBITS_PARSER : TBITS EXPTREE,
              ("Nodes in expression tree = %d\n", nnode))

Putting all these ideas together provides an easy-to-use package for the programmer to add selective debugging print (or other) statements into code without any cost in object code in a production version. Because the overhead of keeping the debugging statements is usually low, you might consider keeping the statements in a "final" version of a program; a large program is rarely completely bug free, and it's nice to be able to turn on selected debugging statements when a user of the program rings and complains about an apparent error.

A beneficial side effect to using such a method to facilitate testing and debugging is that it encourages the programmer to think about the legal and illegal states of the data structures in the program, and to find easily understood ways of displaying the program state. This discipline in itself tends to produce tighter, better thought-out code.

Don't Forget the Stack

Consider the following problem. You have a big program that makes many calls to a low-level function called do_str( ). The program fails, and you suspect that do_str( ) is called with a NULL pointer, so what do you do? Try adding a statement similar to that in Example 10 to trap the error.

Example 10: Adding a statement to trap an error

  do_str(s)
  char *s;
      {
      if (s == NULL)
           printf("do_str(NULL)!\n");

OK, so you get the error message, but where was do_str( ) called? Wouldn't it be wonderful to have a standard function in your library, say caller( ), which would give you the name (as a C string) of the calling function? You could then write the code shown in Example 11.

Example 11: Using caller

  do_str(s)
  char *s;
      {
      char *caller();
      if (s == NULL)
          printf("%s() calls do_str(NULL)\n", caller());
      }

An impossible dream? Not so, as I will show. By accessing the stack you can reconstruct the entire calling sequence from main( ) down to the current function, and you don't need a debugger -- you use the same techniques that a debugger uses, but you turn an introspective eye on the program as it is running. The details are dependent on the operating system and compiler that you are using, but the ideas can be applied to almost all implementations of C (and to many other procedural languages), although a couple of assembler subroutines might be needed. To begin with, you have to understand how your compiler uses the stack to make function calls.

Looking at the Stack

The program showstack (see Listing Two, page 93), perhaps with some simple modifications, will reveal the secrets of the stack on most computers.

The program makes a couple of function calls with easily recognized values for arguments, and prints out the contents of the stack. I use SCO Xenix on an 80386 computer with the Microsoft C compiler, and the output on my machine is shown in Figure 1 (with annotations).

The stack contains local variables ("automatic" variables in official C jargon, although no one seems to use the term), function arguments, and data required to manage the call/return sequences. Under Xenix, the stack grows toward lower addresses, so the macro STACKLOW is defined. To see whether the stack grows toward higher or lower addresses on your box, print out &argv and &lower, if &argv < &lower, then your stack grows toward higher addresses. Each call to a function has its own area on the stack, sometimes called the "frame" for that function. While that function is being executed, a register is typically reserved to point at the frame on the stack. This register is often referred to as the base pointer (BP), and will typically contain the value of the stack pointer (SP) when the function was called. To call a function, the computer does something like that in Example 12.

Example 12: Calling a function

  push n'th argument
  ...
  push 2nd argument
  push 1st argument
  push current instruction pointer
               (ie. return address)
  jump to start of function
  copy stack pointer (SP) to base
                     pointer (BP)
  push BP

Your computer may do other things -- push some registers to save their values, for example -- but will almost certainly perform these operations, possibly in a different order. When this sequence is completed, the top of the stack will look like that in Example 13. (The stack grows down the page, so the "top" of the stack is the last line -- confusing, but conventional.) The base pointer is now used as a base to locate function arguments and local variables. Notice that the arguments are pushed in reverse order; the first argument is then a known offset from BP, allowing function calls with variable numbers of arguments. Other schemes are occasionally used, such as pushing the number of arguments onto the stack. Local variables will come after the caller's BP on the stack.

Example 13: The top of the stack after functions have been called

        2nd arg
        1st arg
        Return addr.
  BP--> Caller's BP

To return from a function, the caller's BP is restored (in pseudo C: BP = *BP), and the return address is then a known offset away.

Frames for each function make a linked list on the stack, with the current BP as the head of the list. It is easy to follow the links: g's BP --> fs BP --> main's BP in the output from showstack, as indicated by the arrows. The frame for the call to main( ) points back to the C startup function, which performs chores such as setting argc and argv before calling main( ). A debugger uses this linked list of function invocations to trace the call sequence in a program, and you can exploit it to create some powerful debugging tools. The function shown in Listing Three, page 93, follows the list; it may need adjustment for other compilers.

The stack frame for the call to ctrace( ) itself is found by taking the address of its first argument: The output from showstack shows that the caller's BP is at an offset of 2 from the first argument, at address (&arg_1 - 2). The return address is at an offset of 1 from the BP. The output in Example 14 was produced by modifying showstack so that g( ) does nothing except call ctrace( ).

Example 14: Modifying showstack so that g( ) calls only ctrace( ) results in this output

  &main=00000094 &f=000000db
                 &g=000000f9
  BP=0187ed20 RET. ADDR=0000010a
  BP=0187ed38 RET. ADDR=000000f1
  BP=0187ed50 RET. ADDR=000000d3
  BP=0187ed6c RET. ADDR=0000057d

The values of &main, &f, and &g show the address ranges of the code for these functions in memory (see Table 1).

Table 1: Values of &main, &f, and &g

  Function  Addresses
  -----------------------

  main()    0x94 to 0xda
  f()       0xdb to 0xxf8
  g()       0xf9 to ???

By knowing these address ranges, it is possible to work out where a given return address points. You need an array of function address and function name to make the conversion of an address in the code area of a program to the name of the function. Listing Four, page 93, shows how this is done.

The function atoname( ) finds the function closest to, but starting before, the address given as its argument. If the printf( ) statement in ctrace( ) is changed to that shown in Example 15, the output will appear as the example then shows. The C startup function is erroneously identified as g( ), because g( ) is the closest function that atoname( ) knows.

Example 15: The function atoname( ) finds the function closest to, but starting before, the address given as its argument

     printf("BP=%081x FUNCTION=%s\n",
              bp, atoname(*(bp + 1)));

  then the output looks like this:

          &main=00000094 &f=000000db
                         &g=000000f9
     BP=0187ed28 FUNCTION=g
     BP=0187ed40 FUNCTION=f
     BP=0187ed58 FUNCTION=main
     BP=0187ed74 FUNCTION=g

Function arguments are also accessible at a known offset from the BP. With my compiler there is no way of determining the number of arguments made in a call, so I assumed that there are two arguments of type int, and changed ctrace( ) again, as shown in Example 16. If the symbol table used by atoname( ) was extended with the number and types of the arguments to each function, the output could be further improved.

Example 16: Changing ctrace( ) again

      printf("%s(%x, %x)\n", atoname(*(bp + 1)), *(newbp + 2),
      *(newbp + 3));

After this change, the output of the test program became:

      &main=00000094  &f=000000db  &g=000000f9
      g(55556666, 187eda8)
      f(11112222, 33334444)
      main(1, 187eda8)
      g(1, 187ee20)

Most software development environments provide a symbol table of the type required by atoname( ), which can be read at run time. This might be the "memory map" provided by a linker or binder, or, as in the Unix or Xenix environment, a table included in the executable run-time file itself. This table may provide little more than the address of each function entry point, or may include considerable details such as the source file name, addresses of statements identified by source file line number, numbers and types of function parameters, and other information that could be exploited by tracing functions such as those described here.

_DEBUGGING C PROGRAMS_ by Bob Edgar

[LISTING ONE]



short tlevel = 0;

main(argc, argv, envp)
char *argv[];
char **envp;
    {
    int n;

    for (n = 0; n < argc; n++)
        {
        if (argv[n][0] == '-' && argv[n][1] == 'L')
            {
            int i;
            char digit;

            /*  Found -L argument - process it  */

            digit = argv[n][2];
            if (digit < '0' || digit > '9')
                    {
                    printf("Bad -L option\n");
                    exit(1);
                    }
            tlevel = digit - '0';

            /*  Delete this element from argv[].  */
            /*  We assume that argv[] has argc+1  */
            /*  elements (most systems set argv[argc]  */
            /*  to zero).  */

            argc--;
            for (i = n; i < argc; i++)
                    argv[i] = argv[i+1];
            }
        }
    mymain(argc, argv, envp);
    }






[LISTING TWO]


/* showstack:  Show layout of host machine's stack */

#define STACKLOW        1
int *stacktop, *ip;
int f(), g();

main(argc, argv)
char **argv;
    {
    stacktop = (int *) &argv;
    printf("&argc=%08lx  &argv=%08lx\n", &argc, &argv);
    printf("&main=%08lx  &f=%08lx  &g=%08lx\n", main, f, g);
    f(0x11112222, 0x33334444);
    }

f(arg_1, arg_2)
    {
    g(0x55556666);
    }

g(arg_2)
    {
    int local;

    local = 0x77778888;
#ifdef  STACKLOW        /*  Stack grows towards LOWER addresses  */
    for (ip = stacktop; ip >= &local; ip--)
#else                   /*  Stack grows towards HIGHER addresses  */
    for (ip = stacktop; ip <= &local; ip++)
#endif
         printf("%08lx\t%08x\n", ip, *ip);
    }







[LISTING THREE]


calltrace(arg_1)                /*  Trace calling sequence  */
    {
    int *bp, *newbp;

    bp = (int *) (&arg_1 - 2);

    while (bp < stacktop)   /*  "<" because STACKLOW  */
        {
        newbp = (int *) *bp;    /*  next link in list  */
        printf("BP=%08lx  RET. ADDR=%08lx\n", bp, *(bp + 1));
        bp = newbp;
        }
    }







[LISTING FOUR]



struct func
    {
    int (*addr)();
    char *name;
    };

int main(), f(), g();

struct func funcs[] =           /*  symbol table  */
    {
    main, "main",
    f, "f",
    g, "g",
    };

int nfuncs = sizeof(funcs)/sizeof(funcs[0]);

char *atoname(a)        /*  convert address to function name  */
int (*a)();             /*  address  */
    {
    char *s;
    int (*maxa)();
    int n;

    maxa = 0;
    s = "?";
    for (n = 0; n < nfuncs; n++)
            if (funcs[n].addr < a && funcs[n].addr > maxa)
                    s = funcs[n].name, maxa = funcs[n].addr;
    return s;
    }


Example 1: Defining the NDEBUG macro

        #ifndef NDEBUG
        #define assert(cond) {if (!(cond)) \
            printf("ASSERT cond,  file %s line %d\n", \
            __FILE__, __LINE__); exit(1);}
        #else
        #define assert(cond) {;}  /*  Empty block  */
        #endif


Example 2: Using an assertion

        do_str(s)
        char *s;
            {
            assert(s != NULL)
            /*  rest of do_str...  */



Example 3: using parentheses so that the entire arugment list to
printf is a single argument

        #define assertp(cond, args)     {if (!(cond)) \
            printf("ASSERT cond, file %s, line %d\n", \
            __FILE__, __LINE__); \
            printf args; exit(1);}


Example 4: A typical use of assert

        assertp(n > 3, ("Surprise! n > 3, is %d\n", n))


Example 5: A macro which unconditionally prints when the trace
level is high enough

        #define assertl(level, args)    {if (tlevel >= (level)) \
            printf("file %s, line %d\n", \
            __FILE__, __LINE__); \
            printf args; exit(1);}


Example 6: A typical use of Example 5

        assertl(2, ("i=%d j=%d\n", i, j))


Example 7: An assert.h header file

        #ifndef NDEBUG
        extern short tlevel;
        #define main    mymain
        #endif


Example 8: Typical bit definitions

        #define TBIT_LEXER   0x0001 /* Lexical analysis functions  */
        #define TBIT_PARSER  0x0002 /* Parser  */
        #define TBIT_EXPTREE 0x0004 /* Expression tree  */
        #define TBIT_CODEGEN 0x0008 /* Code generator  */
        #define TBIT_OPTIM   0x0010 /* Optimizer  */


Example 9: Defining assert-like macros

            #define assertb(bits, args)     {if ((bits) & tbits) \
                    printf args;}

which can be used in this way:

        assertb(TBITS_PARSER | TBITS_EXPTREE,
            ("Nodes in expression tree = %d\n", nnode))


Example 10: Adding a statement to trap an error

        do_str(s)
        char *s;
            {
            if (s == NULL)
                 printf("do_str(NULL)!\n");


Example 11: A function for a calling function

        do_str(s)
        char *s;
            {
            char *caller();
            if (s == NULL)
                printf("%s() calls do_str(NULL)\n", caller());
            }


Example 12: Calling a function


        push n'th argument
        ...
        push 2nd argument
        push 1st argument
        push current instruction pointer (ie. return address)
        jump to start of function
        copy stack pointer (SP) to base pointer (BP)
        push BP


Example 13: The top of the stack after functions have been called

              2nd arg
              1st arg
              Return addr.
        BP--> Caller's BP


Example 14: Modifying showstack so that g() calls only ctrace()
results in this output

    &main=00000094  &f=000000db  &g=000000f9
    BP=0187ed20  RET. ADDR=0000010a
    BP=0187ed38  RET. ADDR=000000f1
    BP=0187ed50  RET. ADDR=000000d3
    BP=0187ed6c  RET. ADDR=0000057d


Example 15: The function atoname() finds the function closest to,
but starting before, the address given as its argument.

        printf("BP=%08lx  FUNCTION=%s\n", bp, atoname(*(bp + 1)));

then the output looks like this:

        &main=00000094  &f=000000db  &g=000000f9
        BP=0187ed28  FUNCTION=g
        BP=0187ed40  FUNCTION=f
        BP=0187ed58  FUNCTION=main
        BP=0187ed74  FUNCTION=g


Example 16: Changing ctrace() again

        printf("%s(%x, %x)\n", atoname(*(bp + 1)), *(newbp + 2), *(newbp + 3));

After this change, the output of the test program became:

        &main=00000094  &f=000000db  &g=000000f9
        g(55556666, 187eda8)
        f(11112222, 33334444)
        main(1, 187eda8)
        g(1, 187ee20)











Copyright © 1989, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.