C Dynamic Memory Use

Randy shares some proven techniques to help you detect, identify, and cope with C dynamic memory problems.

August 01, 1989
URL:http://www.drdobbs.com/cpp/c-dynamic-memory-use/184408186

Figure 1

AUG89: C DYNAMIC MEMORY USE

C DYNAMIC MEMORY USE

Finding and exterminating C's dynamic memory errors

Randall Merilatt

Randall is co-founder and chief scientist at Raima Corporation and can be contacted at 3245 146th Place SE, Ste. 230, Bellevue, WA 98007.

A gas is defined as a substance capable of expanding to completely fill a container and take on the shape of the container. When you consider software and a computer's memory in light of this definition, it is only reasonable to conclude that software is a gas. The situation has always been that no matter how large a container you make (that is, how much memory your computer has), software expands to fill it completely. We've been increasing the size of our containers for more than 40 years now, and it still seems as though we never have enough. Because of this, we have had to be quite miserly in our use of memory just to get the most out of what has always been and, apparently, always will be a precious commodity.

Memory is used to store a program's executable code and the data that is manipulated by that code. Overlays are one method used to limit the amount of memory used for storage of executable code. A program can store its data in static or dynamic memory. Static memory is allocated at the time the program is first loaded into memory by the operating system, and the amount of static memory used by a program is fixed. Dynamic memory, on the other hand, is allocated as needed at the request of the program during execution. Dynamic memory is assigned from a pool of available memory managed by the operating system that is often referred to as the heap. When the program finishes using a block of dynamic memory, it can return the memory to the heap. Note that static and dynamic memory are strictly software differentiations based on how memory is used. The kind of physical memory, be it standard RAM, extended, or expanded, is immaterial.

Prudent use of memory is enhanced by minimizing the software's reliance on static memory and maximizing its use of dynamic memory. The choice of programming language can have an important impact on the ability to do this, and C is particularly well suited for extensive use of dynamic memory. "Gas" is also defined as "any substance that produces a poisonous, irritating, or asphyxiating atmosphere." Software does not match this definition because it actually seems to create an atmosphere in which bugs thrive. Which takes me to the primary subject of this article -- bugs, specifically the kind that result from the use of C's dynamic memory features, and how to find and exterminate them.

C's Dynamic Memory Capabilities

The standard C library provides four functions for managing dynamic memory: malloc, calloc, realloc, and free. A summary of the use of these functions is provided in Table 1. These functions are roughly equivalent to those provided in other languages (for example, Pascal/Modula-2's new and dispose). The real power comes from C's pointer and array concepts.

Table 1: C's dynamic memory allocation functions

  C Declaration         Function Description
  -------------------------------------------------------------------------

  char *malloc (size)  Allocates a block of size bytes of memory returning
  unsigned int size;   a pointer to the start of the block or NULL if
                       there is not enough memory available.

  char *calloc         Allocates and clears num contiguous blocks of memory
  (num, size)          each size bytes in length returning a pointer to the
  unsigned int num;    start of the block or NULL if there is not enough
  unsigned int size;   memory available.

  char *realloc        Adjusts size of allocated block.  Returns a pointer
  (ptr, size)          to the block of size bytes or NULL if there is not
  char *ptr;           enough memory available.  The original block may
  unsigned int size;   need to be moved to accommodate the new size.

  free (ptr)           Frees a block of dynamic memory pointed to by ptr
  char *ptr;           and allocated by a previous call to malloc or
                       calloc.

In C, arrays can be either static or dynamic. Static arrays are declared with a constant size; dynamic arrays are declared as pointer variables. The size of the array is specified at run time when the memory for the array is allocated (using either malloc or calloc). In C, both static and dynamic arrays can be referenced using exactly the same syntax, so there is no requirement to statically define array sizes. The specification of the size of a needed array can be deferred until program execution time, when the program has suitable information regarding the intended use.

Languages without this capability force you to impose arbitrary limits on the user because of the need to fix array sizes at compile time. The end result is a large program with table sizes set up to accommodate extreme usage situations.

Types of Dynamic Memory Misuse in C

The danger inherent in C's dynamic memory and pointer manipulation features comes from the possibility that pointers can be assigned errant values without detection by the compiler or run-time system. Errant pointers often result in an address reference outside the limits of the memory space assigned to the program. In protected environments, such as Unix and OS/2, this results in an error called a segmentation violation, which is detected by the use of a debugger that can easily reveal the location of the offending statement. Both MS-DOS and the Macintosh's OS, however, are unprotected environments that allow programs to address any part of memory. In these environments, an errant pointer could do anything from causing no damage at all to destroying the entire file system! Debugging in these environments can be a most painstaking experience, even with the use of a debugger.

The insidious nature of another type of problem that can occur is alluded to in the following warnings:

". . . any program that changes memory that is not allocated to it stands a chance of destroying a DOS memory management control block. This causes unpredictable results that don't show up until an activity is performed where DOS uses its chain of control blocks (the normal result is a memory allocation error, for which the only corrective action is to restart the system). -- " From page 11-3 of the DOS Version 3.30 Technical Reference.

"Attempting to free an invalid pointer may affect subsequent allocation and cause errors." -- From page 290 of the Microsoft C 5.1 Run-Time Library Reference.

These comments refer to a common manifestation of an errant pointer bug that results in the system "hanging" on an ensuing call to malloc. The specific call that reveals the problem may actually occur after many other malloc calls have succeeded, compounding the debugging problem.

In my experience, the list in Table 2 identifies some of the most common programming errors that lead to these dynamic memory corruption problems.

Table 2: Common dynamic memory programming errors in C

  1. Writing beyond the end of an allocated block of memory.
     A typical example is calling function strcpy with a source
     string that is missing the sentinel NULL byte.  It can also occur
     from an "off by one" subscripting error.

  2. Freeing a pointer to unallocated memory.
     Have you ever tried freeing static memory?  Sometimes, it is not
     that unusual to assume a character pointer points to a dynamic string
     when, in fact, it points to a string constant.

  3. Freeing a pointer to a previously freed block of memory.
     Often, a system has a variety of dynamic data structures, such as
     inverted lists, containing pointers to the same dynamic memory.
     This error happens when an attempt is made to free the common
     memory pointer from multiple locations.

  4. Continued use of freed dynamic memory.
     Some dynamic structures are allocated and freed as needed during
     execution.  The need for allocation is often determined by whether
     or not a particular pointer is NULL.  This error occurs when you
     forget to assign a NULL to that pointer when the structure is freed
     so that a later allocation that should occur, in fact, doesn't.

Management and Debugging of Dynamic Memory

The need for the use of dynamic memory in Raima's database software, db_VISTA and db_QUERY, is particularly acute. Both packages are C linkable libraries providing full-featured database management and query engines. As such, they need to use as little memory as possible while still providing the flexibility needed to support a wide variety of applications. Yet, because of the dangers inherent in C's dynamic memory capabilities, and the critical need for reliability in a DBMS, a disciplined approach to dynamic memory usage had to be developed so that this insidious class of bugs could be easily detected and corrected.

Because most C implementations of malloc and free provide little, if any, error checking, it was necessary to build our own extended dynamic memory control module, which we've dubbed Xmem and which is shown in Listing One. Xmem keeps track of all pointers to dynamically allocated memory and performs checks for the four errors delineated in Table 2. Defined in the module are functions x_malloc, X_calloc, and x_free, which are to be called instead of malloc, calloc, and free. One additional function, called X_chkfree, checks to ensure that all blocks allocated using X_malloc and x_calloc have been freed and, reports and frees any that have not been freed.

A global int variable, memtrace, can be set to determine the type of dynamic memory control to be used. If memtrace is 0, no error checking is performed -- that is, x_malloc, X_calloc, and x_free simply call malloc, calloc, and free. A memtrace value of 1 enables pointer tracking and checking for errors 1, 2, and 3. Setting memtrace to 2 additionally checks for error 4.

Xmem maintains two global long variables that can be referenced by the application program when memtrace is nonzero. Variable tot_memory contains the total amount of dynamic memory that is currently allocated; variable tot_alloc contains the total number of calls to x_malloc and x_calloc.

A table of all allocated pointers is maintained by Xmem. When a request for a block of memory is made through a call to x_malloc, a gap of extra space is allocated at the end of the block to be used to check for overwrites. The extra space is filled with some predefined character (FILLCHAR) and checked for changes when the block is freed (error 1). The size of the allocated block and the pointer are stored in the table.

The memory allocation tracking table simplifies checking for an invalid pointer. When x_ free is called, if the pointer is in the table, the space is freed and the pointer's entry is removed. If the pointer is not in the table, then it is either invalid or it has already been freed (errors 2 and 3).

Setting memtrace to 2 enables checking for changes to a freed block of memory (error 4). Instead of actually freeing the block, x_ free allocates a copy of the block (if the copy already exists, then we know that the block has been previously freed -- error 3). Function x_chkfree compares the copy with the original and reports any changes. Note that this option does not free any memory but uses virtually twice as much. The errors detected by this method are particularly treacherous, however. Note that it is not really necessary to maintain a copy of the block in order to detect changes. A cyclic redundancy check (CRC) value (for example, check-sum) could be used instead. This would indeed identify that a change has occurred. The disadvantage of using a CRC, however, is that you do not know what was changed. With a copy, you can know the exact location of the changes and use a debugger (for example, Microsoft's CodeView, placing a trace point on the changed location) to find the offending code.

The memory allocation tracking table is constructed as a hash table. A hash table is a structure that stores an item in a location in the table that is computed based on the value of the item. In our case, a hash index is computed from the value of the pointer by casting the pointer to a long and computing its modulo based on the size of the hash table array. A hash table was chosen because of the amount and frequency of needed allocations in db_QUERY. Depending on the complexity of the query, as few as ten but up to as many as several hundred allocations can occur during the setup and processing of a query. Thus, it is important to be able to locate a pointer's position in the table quickly.

The hash table consists of an array of pointers to buckets. A bucket stores information about each allocated pointer. Lines 39 through 55 in Listing One contain the hash table declarations. Field alloc in struct bucket is a dynamic array of struct alloc_entry. This array will store up to bucketsize (line 28) allocated pointer entries. Each alloc entry contains the size of the allocated block, the pointer (ptr) to the allocated block, and a pointer to the copy of the block allocated when the block is freed. The size of the hash table is specified by a global int variable hashsize (line 27). This value must be odd (preferably prime) or the odd-numbered buckets will never be used.

The hash table is a dynamic array of bucket pointers called ptrhash (line 55). Because more than bucketsize pointers could have the same hash value, the buckets contain a next pointer to allow additional buckets to be allocated and chained to the same ptrhash entry. Figure 1 illustrates the structure of the memory allocation tracking table.

Xmem defines two local functions that manage the hash table and report errors. Function sto_ptr stores the specified pointer and the size in the hash table. This function allocates the dynamic memory for ptrhash the first time it is called. Line 75 computes the hash table index, which is used in line 78 to subscript the ptrhash array. Any buckets that exist for the computed ptrhash entry are searched in lines 77 - 80 for an available alloc array entry. If there is no available space, a new bucket is allocated and connected to the linked list (lines 82 - 102). The pointer information is then stored into the next available alloc array slot in the bucket (lines 103 - 107). Function del_ptrdeletes the specified pointer from the hash and frees the dynamic memory referenced by it. When the pointer is located (lines 131 - 134), the gap is checked and, if changed, the error is reported (line 139). If memtrace is 1, the pointer is removed from the bucket and the dynamic memory is freed (lines 145 - 161). If memtrace is 2, a copy of the allocated memory is made, unless one already exists, for comparison by function x_chkfree (lines 164 - 171).

When memtrace is nonzero, function x_malloc calls malloc to allocate the requested memory, augmented by the extra gap space. It then calls sto_ptr to store the pointer and size in the hash. Function x_calloc calls x_malloc to allocate the requested memory and then calls the standard C function memset to clear the allocated block.

Function x_free checks for a NULL pointer (something that, for some unknown reason, many implementations of free do not do), and if memtrace is nonzero, it calls del_ptr to remove the pointer from the hash and free the space. If memtrace is 0, x_free simply calls free.

Function x_chkfree searches the hash table, reporting and freeing any pointers that remain. If memtrace is 2, it checks for any changes to freed blocks. The hash table itself is also freed by x_chkfree.

When errors are reported, a debugger is needed to track down the specific source of the problem. For the case in which the gap has been over-written, you can record the value of the pointer and rerun the exact test scenario, setting a breakpoint, for example, at line 211 in x_malloc to find out when and where that particular allocation occurred. You can discover how the memory is to be used from the function calls stack. This will often be sufficient to reveal the problem.

You could also use the debugger to break whenever the gap is modified, which would reveal the specific statement that is causing the problem. Unfortunately, this kind of operation is very slow in most debuggers, primarily because of the lack of good hardware support for debugging.

Setting a breakpoint at line 184 in function del_ptr and then inspecting the function calls stack when the breakpoint occurs will often explain why a pointer to be freed is not in the table. Perhaps it has already been freed, or you see that the space being freed is actually static.

Enhancements

Many enhancements could be made to the functionality presented here. For example, it is often valuable to be able to free not only a single pointer but also all other dynamic memory that was since allocated. This could be accomplished by adding an allocation number to the alloc_entry struct that is assigned the current tot_alloc value when stored in the table. A new function called x_release could be added that would free all pointers in the table with an allocation number greater than that associated with the specified pointer.

The CRC method of detecting changes to allocated memory described earlier could form the basis of a change-tracking system whereby those dynamic variables that had been modified by a given function could be reported. This could be implemented by adding a string argument to x_malloc and x_calloc that contained the name of the pointer variable being allocated. The alloc_entry-struct would store this string as well as the current CRC value. A function that would be called as desired (for example, on exit from each function in the application program) would report the names of all variables that had different CRC values. If the application's function name were printed first, then the list could be inspected to ensure that the proper variables were modified.

Conclusion

Because memory has always been and still is a precious commodity, we have had to be frugal in our use of memory in the development of our programs. The ability in C to maximize the use of dynamic memory has allowed us to develop sophisticated software systems to run on small computers. C has often been criticized, however, for the ease with which catastrophic errors can occur resulting from the very use of these powerful features. Some people have even written off the use of C because of these problems.

This article has presented proven techniques to detect and identify errors in the use of dynamic memory in C: As C and its supporting cast of development tools, particularly debuggers, continue to evolve, there is no doubt in my mind that these problems will be much more easily avoided.

Availability

All source code for articles in this issue is available on a single disk. To order, send $14.95 (Calif. residents add sales tax) to Dr. Dobb's Journal, 501 Galveston Dr., Redwood City, CA 94063, or call 800-356-2002 (from inside Calif.) or 800-533-4372 (from outside Calif.). Please specify the issue number and format (MS-DOS, Macintosh, Kaypro).

C DYNAMIC MEMORY USE by Randall Merilatt

[LISTING ONE]



/*---------------------------------------------------------------
   xmem.c -- Extended Dynamic Memory Control Module.
---------------------------------------------------------------*/
/* ********************** INCLUDE FILES ********************** */
#include <stdio.h>

/* ******************** EXTERNAL FUNCTIONS ******************* */
extern char *malloc(unsigned int);
extern char *calloc(unsigned int, unsigned int);
extern void  free(char *);

/* ********************* GLOBAL FUNCTIONS ******************** */
char *x_malloc(unsigned int);
char *x_calloc(unsigned int, unsigned int);
void  x_free(char *);
void  x_chkfree();

/* ********************* GLOBAL VARIABLES ******************** */
/* memtrace usage:
      = 0 => simple calls to malloc & free
      = 1 => tracking of all allocations using hash table
      = 2 => checking for changes to previously freed blocks
*/
int  memtrace   = 1;     /* memory tracing control variable */
long tot_memory = 0L;    /* total amount of allocated memory */
long tot_alloc  = 0L;    /* total # of allocations */
int  hashsize   = 47;    /* size of hash table */
int  bucketsize = 10;    /* number of entries per hash bucket */

/* ********************* LOCAL VARIABLES ********************* */
/* memory allocation tracking table */

/* amount of extra allocation for overhead */
#define OVHDSIZE 2

/* fill character for overhead gap */
#define FILLCHAR '\377'

/* allocated entry information */
typedef struct alloc_entry {
   int   size;           /* size of allocated area */
   char *ptr;            /* pointer to allocated area */
   char *freed;          /* pointer to copy of allocated area */
} ALLOCATION;

typedef struct bucket {
   struct bucket *next;  /* pointer to next bucket when filled */
   int entries;          /* number of used entries */
   ALLOCATION *alloc;    /* allocated entry array */
} BUCKET;

#define NUL_BUCKET ((BUCKET *)0)

/* dynamic pointer hash table */
static BUCKET **ptrhash = (BUCKET **)0;

/* ==============================================================
   Store pointer in hash table
*/
static char *sto_ptr(p, b)
char *p;     /* pointer to be stored */
unsigned b;  /* size of area */
{
   register BUCKET *bp, *bq;    /* bucket pointers */
   register int bno;            /* bucket/entry number */

   if ( ! ptrhash ) {
      /* allocate pointer hash table */
      ptrhash = (BUCKET **)calloc(hashsize, sizeof(BUCKET *));
      if ( ! ptrhash )
         return(NULL);
      tot_memory = hashsize * sizeof(BUCKET *);
   }
   /* compute hash table index */
   bno = (int)((unsigned long)p % hashsize);

   /* find first bucket with available entries */
   for (bq = bp = ptrhash[bno]; bp && bp->entries == bucketsize;
      bp = bp->next)
      bq = bp;

   /* allocate new bucket if necessary */
   if ( bp == NUL_BUCKET ) {
      if ( ! (bp = (BUCKET *)malloc(sizeof(BUCKET))) )
         return(NULL);
      bp->next = NUL_BUCKET;
      bp->entries = 0;
      if ( bq )
         /* connect to end of bucket chain */
         bq->next = bp;
      else
         /* initial bucket for this hash entry */
         ptrhash[bno] = bp;

      /* allocate bucket's allocation entry array */
      bp->alloc = (ALLOCATION *)calloc(bucketsize,
                     sizeof(ALLOCATION));

      /* memory total includes space used by hash table */
      tot_memory += sizeof(BUCKET) +
                       bucketsize*sizeof(ALLOCATION);
   }
   /* store pointer to allocated block */
   bno = bp->entries++;
   bp->alloc[bno].ptr   = p;
   bp->alloc[bno].freed = NULL;
   bp->alloc[bno].size  = b;

   /* update total allocation */
   tot_memory += b;

   /* increment total number of allocations */
   ++tot_alloc;

   return(p);
}

/* ==============================================================
   Delete pointer from hash table
*/
static void del_ptr(p)
char *p;  /* pointer to be freed */
{
   int gap;                     /* index into overhead space */
   register BUCKET *bp, *bq;    /* bucket pointers */
   register int bno, i;         /* bucket/entry number */

   /* compute hash table index */
   bno = (int)((unsigned long)p % hashsize);

   /* search bucket(s) for pointer */
   for (bq = NUL_BUCKET, bp = ptrhash[bno]; bp; bp = bp->next) {
      for ( i = 0; i < bp->entries; ++i ) {
         if ( bp->alloc[i].ptr == p ) {
            /* check integrity of gap */
            for (gap=bp->alloc[i].size-OVHDSIZE;
               gap<bp->alloc[i].size; ++gap ) {
               if ( p[gap] != FILLCHAR ) {
                  printf("WARNING overwrite, addr: %lx\n",
                     (long)p);
                  break;
               }
            }
            if ( memtrace == 1 ) {
               /* remove entry from bucket */
               if ( --bp->entries == 0 ) {
                  /* free this bucket */
                  if ( bq )
                     bq->next = bp->next;
                  else
                     ptrhash[bno] = bp->next;
                  free((char *)bp->alloc);
                  free((char *)bp);
                  tot_memory -= (sizeof(BUCKET) +
                                  bucketsize*sizeof(ALLOCATION));
               }
               else if ( i < bp->entries ) {
                  /* move last entry into current spot */
                  bp->alloc[i] = bp->alloc[bp->entries];
               }
               free(p);
            }
            else {
               /* memtrace == 2
                  => save copy to check for bad mods */
               if ( bp->alloc[i].freed )
                  printf("WARNING freeing free ptr, addr: %lx\n",
                     (long)p);
               else if (bp->alloc[i].freed = malloc(bp->alloc[i].size))
                  memcpy(bp->alloc[i].freed, bp->alloc[i].ptr,
                     bp->alloc[i].size);

            }
            /* update total allocated memory count */
            tot_memory -= bp->alloc[i].size;

            /* normal return */
            return;
         }
      }
      bq = bp;
   }
   if ( ! bp )
      printf("WARNING freeing bad pointer, addr: %lx\n", (long)p);
}

/* ==============================================================
   Allocate b bytes of memory
*/
char *x_malloc( b )
unsigned int b;  /* number of bytes to allocate */
{
   register char *mptr;

   if ( memtrace ) {
      /* add gap space */
      b += OVHDSIZE;

      /* allocate memory */
      if ( mptr = malloc(b) ) {
         /* fill gap */
         memset(mptr+b-OVHDSIZE, FILLCHAR, OVHDSIZE);

         /* store mptr in ptrhash */
         mptr = sto_ptr(mptr, b);
      }
   }
   else
      mptr = malloc(b);

   return(mptr);
}

/* ==============================================================
   Allocate and clear i*s bytes of memory
*/
char *x_calloc( i, s )
unsigned int i; /* number of blocks to be allocated */
unsigned int s; /* size (in bytes) of each block */
{
   register unsigned int amt;
   register char *mptr;

   /* allocate requested space */
   if ( mptr = x_malloc(amt = i*s) ) {
      /* clear requested space */
      memset(mptr, '\0', amt);
   }
   return (mptr);
}

/* ==============================================================
   Free allocated memory
*/
void x_free( p )
char *p; /* pointer to block to be freed */
{
   if ( p == NULL )
      printf("WARNING freed a null pointer\n");
   else if ( memtrace )
      del_ptr(p);
   else
      free((char *)p);
}

/* ==============================================================
   Check to ensure all blocks have been freed
*/
void x_chkfree()
{
   ALLOCATION *ap;              /* allocation entry pointer */
   register int bno, i;         /* bucket/entry number */
   register BUCKET *bp, *bq;    /* bucket pointers */

   if ( memtrace ) {
      /* check for unfreed variables */
      for ( bno = 0; bno < hashsize; ++bno ) {
         for ( bp = ptrhash[bno]; bp; bp = bq ) {
            for (i = 0; i < bp->entries; ++i) {
               ap = &bp->alloc[i];
               if ( memtrace == 2 && ap->freed ) {
                  /* check for changes to freed blocks */
                  if ( memcmp(ap->ptr, ap->freed, ap->size) )
                     printf("WARNING block chgd after free, addr: %lx\n",
                        (long)ap->ptr);
               }
               /* free unfreed block */
               printf("WARNING freeing unfreed block, addr: %lx\n",
                  (long)ap->ptr);
               free(ap->ptr);
            }
            bq = bp->next;

            /* free bucket */
            free((char *)bp->alloc);
            free((char *)bp);
         }
      }
      /* free pointer hash pointer array */
      free((char *)ptrhash);
      ptrhash = (BUCKET **)0;

      tot_memory = 0L;
   }
}