Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

The Mtlib Memory-Tracking Library


Oct03: The Mtlib Memory-Tracking Library

A memory-leak-tracking system for C developers

Marco is a technology consultant based in Toronto, Canada who specializes in the integration of open-source software. He can be reached at [email protected].


Memory leaks rank high in any developer's all-time worst-hated bug list because they're both difficult to track down and dangerous to any software application. Memory leaks occur when a block of memory is allocated on the heap—for example, through a call to malloc() or calloc()—but never deallocated by calling free(). Because C does not have a garbage-collection mechanism, the memory block continues to be marked as "used" and is never reclaimed by the heap manager, whose available free memory becomes, therefore, proportionately smaller. If the Standard C Library on which the offending program is based taps into a global heap shared by all applications in the operating system and the OS itself does not implement any garbage collection scheme, the RAM used by a memory leak becomes lost until the next reboot. In some high-availability applications (such as server software), memory leaks that occur in a strategic section of code execution can rapidly saturate the available RAM to the server process and bring it down or significantly impair its performance.

In Example 1, a memory leak occurs when the value of i in useme() is greater than 99, because the test() function returns a NULL pointer after having allocated a block of memory into the tmp pointer. This code shows why it's a good idea to always allocate memory as close as possible to the block of code that deallocates it, thereby being easier to avoid leakage.

Tracking Memory Leaks

The really tough-to-spot memory leaks are those that occur only under certain conditions, or those that are the result of complex program execution flow—and there is no software capable of tracking them down by just scanning your source code.

I used to resolve memory leaks by applying the "bang your head against the wall" technique: Once a leak had been identified and documented—which in itself was a feat usually limited to, "This test case ran out of memory, hence there has to be a memory leak somewhere. Fix it!"—I carefully traced the program's execution until I managed to find where the problem was. This was tedious and sometimes took days of painstaking debugging sessions.

Consequently, I developed Mtlib (short for "memory-tracking library"), the set of tools I present here that automates the process of tracking and identifying memory leaks. Although Mtlib is not a cure-all for the memory-leak problem, my performance in the memory plumbing department has increased significantly ever since I started consistently using it.

Introducing MtLib

My initial goal was to track memory leaks in a transparent way, so that I could include my library in the debug versions of programs without affecting their behavior in any way, then remove the library without changes to the code.

Therefore, the first step in the creation of Mtlib was to find a way to "trap" the execution of the Standard C Library's memory-manipulation functions, which are defined in stdlib.h: malloc() is used to allocate a new block of memory; and calloc() also initializes the memory block to zero. The realloc() function takes an existing memory block and resizes it to a specific dimension. Finally, free() releases a block of memory and puts it back in the heap.

It would be impossible to simply redefine these functions by introducing clones with the same function names, since this would cause naming conflicts with the ones that already exist. An alternative, therefore, is to rewrite the Standard Library and force the compiler to link your applications against that instead of those it provides. Even if this were possible, it certainly isn't practical.

The solution is to redefine the memory-allocation functions as macros, so that any reference to them will be rewritten using a set of functions defined inside the leak-tracking library. This is easily accomplished because macros are processed before the code is parsed, and their introduction does not introduce any naming conflicts; see Example 2. For this trick to work, all the mt_ functions must follow the same declarations as their Standard Library counterparts, or attempts at compiling the code that uses them fails.

Dropping these definitions in an include file (mtlib.h), such as in Listing One, then including them in your source makes it possible to create a drop-in leak-tracking system that does not affect your existing code, while providing you with a valuable debugging tool.

Tracking Memory Allocation

Once it has taken control of the memory functions, Mtlib must track memory allocation throughout a program's execution. To that end, it can output a list of memory blocks that have not been deallocated.

This is done by creating a simple single-linked list that contains the addresses of all the memory blocks that the program allocates, together with their length and a unique ID that the library assigns to them as they are created. This information is stored in the MEMDATA structure, defined in the implementation file mtlib.c (available electronically; see "Resource Center," page 5). The linked list is manipulated through the insert() and delete() functions, respectively, which add and remove a node from it. When a new memory block is allocated, mt_alloc() and mt_calloc() call the standard alloc() and calloc() functions and add the addresses returned by them to the list. Similarly, when free() is called, the corresponding memory-block information is removed from the list. Additional functionality is provided directly inside mt_realloc(): Because there is no guarantee that a resized memory block will reside at the same address, mt_realloc() locates that block within the linked list and ensures that the correct address is stored in the corresponding MEMDATA instance so that the memory block can be correctly tracked throughout its existence.

At the end of the program's execution, an explicit call to mt_terminate() causes it to walk through the contents of the linked list (if any) and output as much information as possible about each block that has not been deleted, which at this point can be assumed to be a memory leak.

Mtlib.c does not include mtlib.h because there is no direct relation between the information defined in the include file and the code in mtlib.c is beyond the declarations of the functions. More important, however, including mtlib.h would cause all the memory-allocation functions to be redefined so that each of the mt_ functions would, essentially be calling themselves and create an infinite loop (and, soon thereafter, a stack overflow) as soon as the first block of memory is allocated.

I designed Mtlib to be a simple drop-in leak detector for the debug and testing phase of a product. As such, the library is geared toward simplicity, rather than efficiency. Still, the code does perform as much checking on the memory-allocation functions as it is necessary to maximize the chances that it will be able to run. This way, if your heap becomes corrupted but code execution can make it to the point where mt_terminate() is called, there is still a chance of finding out whether a memory leak caused the corruption.

Identifying Leaks

If you run the program in Listing Two (which omits to release one of the memory blocks it allocates), the Mtlib library lets you receive output similar to Figure 1, indicating that there are memory leaks in your program.

Unfortunately, knowing there is a problem isn't that useful if you can't fix it. This is why each memory block allocated through Mtlib is assigned a unique identifier incremented with each call to insert(). Mtlib provides a mechanism to set breakpoints when a memory block that is assigned a particular ID is allocated. All you have to do is define the constant MT_BREAKPOINT_ID inside mtlib.c and set a breakpoint on the appropriate line of the insert() function. From there on, you can use your favorite debugging tool to trace where the memory block was allocated and determine why it is not deallocated.

Alas, this is not a practical way of finding memory leaks. I initially wrote Mtlib for the debugging phase of a computer game I was working on, out of frustration for all the memory problems that the team was experiencing. The library did its job well, but we were working on a project that had some 50,000 lines of code, and it was soon evident that setting one breakpoint at a time was not the quickest way to find and fix all the leaks. While tracing the program's execution was sometimes necessary (for example, because a leak was the result of some extraordinary conditions that we had not planned for in the code), most of the time our mistakes were obvious. Once we found where the memory was allocated—which could only be done by setting MT_BREAKPOINT_ID to a particular value—we had to recompile the entire project and run everything from scratch.

I, therefore, made a couple of modifications to Mtlib and ended up with a version that takes advantage of the __FILE__ and __LINE__ constants to print out the exact location in which a memory block that causes leakage is allocated. In Listing Three, the include file requires a few changes to ensure that the file and line information is passed to the library. The changes to the library's code are minimal as well, mainly consisting of declarations and additional code for handling the new data. This version of the library maintains the same "noninterference" principle of its predecessor—it is still a drop-in replacement that requires no changes to the code being tested—but the output (Figure 2) is different and more meaningful. I have left Mtlib and Mtlib2 separate because the latter requires significantly more memory than the former, and may therefore be unsuitable for some testing scenarios in which RAM comes at a premium or a large number of allocated blocks can be found at any moment in time.

Using Mtlib In a Multithreaded Environment

Mtlib is not threadsafe, because there is no locking mechanism to protect the manipulation of the linked list. An obvious solution is to simply ensure that the list is accessed in an exclusive fashion through the locking mechanism that is best suited to your platform. However, this is not a good idea for a number of reasons. First, there is no way to write a simple portable library that would work on any platform because the C Standard does not provide any locking mechanism (and does not, in fact, know anything about threads at all). What's more, because memory allocation is such a fundamental operation, serializing the library's execution would significantly alter the functionality of your program, essentially causing it to behave as a single-threaded application when memory blocks are being manipulated. This gives you the illusion that your application is working fine as long as you're in debug mode, but then causes problems once you take Mtlib out of the picture. Finally, memory allocation in a multithreaded application takes place in a nonreproducible way if observed from a global perspective. Because each task runs on its own terms, tracking a memory leak based on its ID may well be impossible. During one test run, it might be assigned an ID that is completely different from the preceding one, simply because the routine that creates that memory block is executed as part of a difference sequence of events.

Largely because threads are not part of the C Standard, there is no easy way to create a drop-in replacement for its memory-manipulation routines. I use a modified version of Mtlib that makes it possible to create a series of contexts, each essentially a self-contained, leak-tracking environment. This is useful because it can be used inside individual threads without requiring any locking (assuming, of course, that you create a separate context for each thread that runs) and, therefore, makes it possible to easily track memory defects without interfering with the program's execution.

As a result of these complexities, the multithreaded version of Mtlib is not a drop-in companion to the standard memory-management functions anymore. Instead, it defines a series of macros that can be used to create a tracking context, then maintain that context across calls to multiple functions, even when those functions are accessed by different threads.

The macros that create mtfree.h, the include file for this library (available electronically), can be used as in Listing Four. If you do not define the DEBUG symbol before including the library in your project, then all memory-allocation operations will take place directly through the Standard Library. In Listing Four (which is in the test-mtfree directory of the code provided electronically), the main function of the thread must use the MT_DECLARE_DATA and MT_INITIALIZE macros to declare and create a context. MT_DECLARE_DATA should be part of the declaration block of your function.

When defining a function that is called from a thread and that needs to use the memory-allocation functions, the MT_CONTEXT_DATA macro must be added just before its list of parameters. If your function has more than one parameter, then use the MT_CONTEXT_DATA_C instead, which also adds a comma after the declaration of Mtfree's context. You should add a comma manually to the declaration or you will encounter compilation errors when compiling without the DEBUG symbol. Similarly, when calling a function that requires a memory-allocation context, use the MT_CALL_DATA macro at the beginning of your list of parameters. The library also defines an MT_CALL_DATA_C, which you should use if the function requires more than one parameter. Finally, you should insert the MT_TERMINATE macro at the end of your thread's main function, so that its memory-allocation context can be destroyed properly and any memory leaks that have occurred can be output to stderr.

Conclusion

If you use Mtfree, it's a good idea to use it from the start because it changes how you code (although, once the program is compiled in release mode, the end product still performs straight calls to the Standard Library). When using Mtfree, keep in mind that I designed it to work under the assumption that each thread is self-contained and it does not use the Standard Library to allocate memory blocks that are then shared with (or freed by) another thread. This should not be a major concern, since interthread and interprocess communications usually takes place through some platform-dependent mechanism—named pipes or shared memory—and not through the heap.

DDJ

Listing One

// These definitions overwrite the stlib functions. stdlib should
// still be included before this file.

#define malloc mt_malloc
#define calloc mt_calloc
#define realloc mt_realloc
#define free mt_free

void *mt_malloc (size_t size);
void *mt_calloc (size_t items, size_t size);
void *mt_realloc (void *ptr, size_t size);
void *mt_free (void *ptr);
void mt_terminate();

Back to Article

Listing Two

#include <stdlib.h>
#include "../mtlib.h"

int main (void)
{
    char *a;
    a = malloc (100);
    sprintf (a, "This is a test\n");
    printf (a);
    // The memory leak occurs here
    a = malloc (100);
    sprintf (a, "This is another test\n");
    printf (a);
    free (a);
    mt_terminate();
    return 0;
}

Back to Article

Listing Three

// These definitions overwrite the stlib functions. Note that stdlib should
// still be included before this file.

#define malloc(size) mt_malloc(size,__FILE__,__LINE__)
#define calloc(items,size) mt_calloc(items,size,__FILE__,__LINE__)
#define realloc(ptr,size) mt_realloc(ptr,size,__FILE__,__LINE__)
#define free mt_free

void *mt_malloc (size_t size, char *filename, long line);
void *mt_calloc (size_t items, size_t size, char *filename, long line);
void *mt_realloc (void *ptr, size_t size, char *filename, long line);
void *mt_free (void *ptr);

void mt_terminate();

Back to Article

Listing Four

#include <stdlib.h>
#define DEBUG
#include "../mtfree.h"
// This function is where the memory leak occurs
void testme(MT_CONTEXT_DATA)
{
    char *t = MT_MALLOC (10);
    strcpy (t, "Test");
    printf (t);
}
// This is the thread's main entry point
int thread_main (void)
{
    MT_DECLARE_DATA
    MT_INITIALIZE("My Thread")
    testme (MT_CALL_DATA);
    MT_TERMINATE
}
// This function calls the thread main function three times--you should adapt
// it to your platform's threading functionality
int main (void)
{
    thread_main();
    thread_main();
    thread_main();
    return 0;
}

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.