Database

Rethinking Memory Management

By Arthur D. Applegate, June 01, 1994

There's more than one way to skin memory management. Arthur first examines traditional approaches to the problem, then looks at alternative cross-platform memory-management techniques for speeding up allocation and minimizing swapping.

JUN94: Rethinking Memory Management

Rethinking Memory Management

Portable techniques for high performance

Arthur D. Applegate

Arthur is the author of SmartHeap, the portable memory-management library published by MicroQuill Software Publishing. He can be contacted at [email protected].

Applications written in C and (especially) C++ spend much of their execution time allocating and de-allocating memory. Moreover, when applications run in virtual-memory environments, references to objects stored in dynamically allocated memory are the main cause of excessive swapping--a condition that can bring application performance to unacceptable lows.

In this article, I explain why the traditional approaches to dynamic-memory management--malloc and operator new--are inadequate for memory-intensive applications. I suggest alternative techniques that you can use on all platforms to speed up allocation and minimize swapping. Finally, I present an empirical case study that shows how applying these techniques can result in very dramatic improvements in overall application performance.

Why Memory Management Matters

Today, most applications use a great deal of dynamic memory. Operating systems and hardware platforms make ever-larger address spaces available, and applications are growing in size to take advantage of this available memory.

All modern operating systems now provide virtual memory. While virtual memory has real benefits, its presence can affect application performance more than any other factor--a single memory reference can be 10,000 times slower if it causes a page fault! The rate of page faulting is determined by only two factors: available physical memory and how you manage memory in your application.

Two other features of modern operating systems--multitasking and multithreading--also increase the demands on memory management. Every time there is a context switch, the data your application is referencing may be swapped out to make room for the other thread or process. Unless each thread's data is contained in a very tight working set, every context switch will cause a lot of unnecessary swapping.

The increasing popularity of C++ also magnifies the importance of efficient heap management. By their nature, C++ programs use dynamic memory much more heavily than their C counterparts--often for small, short-lived allocations.

malloc Woes

The malloc API specified by ANSI is a general-purpose search for an arbitrarily sized free area of memory. (Note that compilers generally implement C++ operator new directly in terms of malloc.) malloc must handle all block sizes and allocation patterns. Since it does not take a parameter to specify a heap, all allocations through malloc must share the same heap.

The traditional implementation of malloc is simply a list of free blocks that is searched linearly until a suitably sized block is found. With a steady-state pattern of allocations and frees of random sizes, the number of free blocks will, on average, be about half the number of blocks in use; for proof, see Donald Knuth's The Art of Computer Programming Volume 1, second edition (Addison-Wesley, 1973). Consequently, the performance of malloc degrades in proportion to the total number of allocation requests.

In virtual-memory environments, most malloc implementations scale pathologically when the total heap size exceeds currently free physical memory. When malloc's free list does not fit in memory, malloc can thrash, with every call causing page faults. In some implementations, individual calls to malloc can then cause hundreds of page faults, with a single call taking seconds rather than microseconds.

Applications generally reference data items more than once. Therefore, efficiency of references to your own data can be even more important than references by malloc to its own free list. Good locality is the key to fast references to your data in virtual-memory environments.

Because all callers of malloc in a given process share the same heap, individual blocks allocated by malloc tend to be scattered arbitrarily throughout the process's address space. Consequently, a single data structure's elements are strewn across many pages, together with unrelated data--the data has poor locality. The working set required to hold the entire data structure is thus much larger than necessary, since the virtual-memory manager swaps in units of pages. Traversing the data structure therefore causes many expensive page faults. The problem is compounded even further if there are context switches during the data-structure traversal.

Even in non-virtual-memory environments, malloc has another problem: It wastes memory. Because malloc must handle blocks of variable size, it is prone to fragmentation. This, of course, is the condition in which many free blocks too small to be useful are trapped between larger, in-use blocks.

Another instance of memory waste involves granularity and overhead. The size that your program passes to malloc gets rounded up to the nearest granularity multiple--often between 8 and 32 bytes. Overhead consists of the internal header associated with every memory block, which can be as much as 8 or 16 bytes per allocation. Memory waste is significant even if address space is plentiful, because you pay for any increase in working-set size with page faults.

In multithreaded applications, malloc incurs significant extra overhead in serializing access to the heap. Having only one heap also increases contention and prevents processors from being concurrently active in heap operations on multiprocessor platforms.

Solutions

The single most important improvement on malloc is the provision of multiple heaps, or memory pools. With multiple memory pools, you can partition your program's allocations into groups that correspond to your program's data structures.

Many benefits result from this change. First, allocation speed improves, because heap sizes are smaller. Second, there is much less swapping, because locality is vastly improved, which reduces working-set sizes. Third, there is less fragmentation, because per-pool allocations vary less in size and extent. Fourth, there is faster freeing, because you can free an entire memory pool rather than individually freeing thousands of blocks. Finally, you can concurrently allocate in separate heaps without expensive synchronization in multithreaded applications.

Fixed-Size Allocators

Another approach is to change the allocator's algorithm. Remember that malloc degrades with heap size, and is roughly O(n) (in physical memory). What we'd really like is an allocator closer to O(c).

Remember that in most applications, especially those written in C++, the allocations are usually quite small. In fact, most allocations are for small, fixed-size class or structure objects. You could allocate these very efficiently by maintaining free lists of blocks of just the right size for each major class or structure. Allocation is then just a matter of popping off the head of the free list, and deallocation is pushing on a new head.

The benefits of this method include: greater speed than with an allocator of variable-size; better scaling, on the order of O(c) rather than O(n); no internal fragmentation, because all free blocks are just the right size; good locality, if implemented correctly; and, finally, potential elimination of overhead and granularity, because there's no need to store the size with each block (since blocks are of fixed size).

Overloading operator new in C++

In C++, you can overload operator new for individual classes. This allows you to customize memory management on a per-class basis without users of the class even knowing about it (they just call operator new in the usual way).

You can establish a fixed-size allocator for a class, and allocate objects from fixed-size memory pools according to the data structure to which the class object will belong. This elegantly combines the benefits of multiple memory pools and fixed-size allocators.

A Linked-List Example

I'll illustrate the techniques I've discussed with an example that directly applies to many applications: linked-list manipulation. The sample application randomly inserts and deletes randomly sized strings in five linked lists. It does a full linear traversal of each list and then deletes each list. The list size is an input parameter.

The first implementation (Listing One, page 96) uses the default global operator new. Element insertion demonstrates allocation speed. All list links and values are allocated from the same heap. If you run the program with a size such that total list data exceeds available physical memory, operator new thrashes before insertion is complete.

List traversal demonstrates how poor locality impacts the speed of memory references. If the total size of list data exceeds physical memory, each list search thrashes. This is because the lists' memory is all allocated from the same heap. Each page in the heap contains data from each of the lists, so the total number of pages needed for one list (the working-set size) is five times more than it needs to be.

When the program terminates, the destructors for each list are invoked. The list destructor frees each element, demonstrating deallocation speed.

Improving Memory Management

A second version of the program is identical except for memory-management calls. This version uses SmartHeap APIs to demonstrate multiple memory pools, a fixed-size allocator, and memory-pool de-allocation.

Listing Two (page 97) shows the changed definitions in the new version. Each List object has its own memory pool, from which Link objects and string values of that list are allocated.

List's constructor calls MemPoolInitFS to initialize the memory pool with a fixed-size threshold equal to the size of a Link. This allows all Link objects to be allocated with a fixed-size algorithm.

List::Insert allocates a Link from List's pool and passes the pool to Link's constructor. Note that the memory pool is passed to operator new with the placement syntax.

Link defines its own operators new and delete. Link::operator new calls MemAllocFS, the SmartHeap fixed-size allocator. Link's constructor also allocates string values from the owning list's memory pool. This is reasonable, since Link objects and their associated string values tend to be referenced and freed together.

List's destructor simply calls MemPoolFree to deallocate the entire memory pool. This frees all Links and all string values. There is no need to free each Link object or string individually. Moreover, the pages in which these objects reside need not even be touched (they might not be resident).

Note that main, List::RandomOp(), and List::Find are not changed at all, yet the times for each of these functions are dramatically different in the two implementations.

Performance Results

Table 1(a) shows the times, in seconds, printed by both versions of the program when run with a count of 20,000 in Windows 3.1 on a 25-MHz 386 with 4 Mbytes of RAM. Table 1(b) shows the results with a count of 40,000 in Windows NT running on a 33-MHz 486 with 16 Mbytes of RAM. Table 1(c) shows the results with a count of 150,000 in HP-UX 9.0 on an HP 710 with 32 Mbytes of RAM.

In each case, changing only the memory-management calls results in an order of magnitude or greater overall application performance improvement. This example is admittedly somewhat contrived in that the program allocates more data than will fit in available physical memory. However, it illustrates the impact memory management can have on operations that applications commonly and frequently perform. It also serves to alert you to the fact that memory management cannot be ignored if your application is to perform well in a virtual-memory environment.

Alternative Implementations

The techniques I've discussed are illustrated here with SmartHeap, a commercial memory manager available for most platforms. However, other alternatives are available to you, in some cases.

For example, in Windows 3.x, you can use LocalAlloc to suballocate in GlobalAlloced blocks to create multiple heaps. Paul Yao describes this technique in detail in his Windows 3.0 Power Programming Techniques (Bantam Books, 1990). The Win32 API provides a facility for creating multiple heaps; see HeapAlloc in the Win32 API Reference.

For a fixed-sized allocator in C++, a good place to start is Bjarne Stroustrup's The C++ Programming Language (Addison-Wesley, 1991), which includes simple, fixed-size allocator implementations.

If you want to embark on your own implementation of a multiple-pool memory manager, Knuth's The Art of Computer Programming, Volume 1, second edition, is required reading.

Table 1: (a) Windows 3.1 benchmarks (20,000 iterations); (b) Windows NT benchmarks (40,000 iterations); (c) HP-UX benchmarks (150,000 iterations). (Note: All times are in seconds.)

          Operation    Program 1    Program 2
     ----------------------------------------
  (a)
          Insertion     963.01        24.17
          Search         26.15        15.92
          Deletion       91.42         2.09
          Overall App  1086.65        45.15

  (b)
          Insertion      22.65         9.12
          Search        175.79         0.50
          Deletion      159.29         0.14
          Overall App   359.76        10.04

  (c)
          Insertion     319.57        19.97
          Search        171.15         2.82
          Deletion      320.04         0.35
          Overall App   813.21        23.15

[LISTING ONE]



//--------------------------------------------------------------------
// Memory management exerciser.                   By Arthur Applegate.
// Program 1: Creates, traverses, and destroys linked links, to show
// the effect of memory management on an application's performance.
//--------------------------------------------------------------------

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <stdlib.h>

//--------------------------------------------------------------------------
// class Timer: Instances of this class, when declared as automatic variables,
// record and print the time taken to execute the block in which they are declared.
//--------------------------------------------------------------------------
class Timer
{
private:    const char *m;
            clock_t startTime;
public:     Timer(const char *msg) : m(msg), startTime(clock()) { }
           ~Timer()
            {     printf("\n%s: %.2f seconds.", m,
                  (double)(clock()-startTime) / (double)CLOCKS_PER_SEC);
            }
};
//-------------------------------------------------------------------
class Link                          // Link objects are list elements
{            friend class List;
public:      Link (const char *val, Link *p, Link *n);
            ~Link ();
private:     char *value;
             Link *prev;
             Link *next;
};
//-------------------------------------------------------------------
class List                    // List objects are doubly-linked lists
{
public:      List ();
            ~List ();
             const Link *Insert (Link *prev, const char *val);
             void Delete (Link *remove);
             void RandomOp ();
             const Link *Find (const char *val) const;
private:     Link *head;
             Link *tail;
};
//------------------------------------------------------------------
                                            // create a list element
Link::Link (const char *val, Link *p, Link *n)
{
   value = new char[strlen(val)+1];
   strcpy(value, val);
   if ((prev = p) != NULL)      prev->next = this;

   if ((next = n) != NULL)      next->prev = this;
}
//-----------------------------------------------------------------
Link::~Link ()                              // destroy a list element
{
   delete value;
   if (prev)      prev->next = next;
   if (next)      next->prev = prev;
}
//------------------------------------------------------------------
List::List ()                  // create a new list: clear head, tail
{
   head = tail = NULL;
}
//------------------------------------------------------------------
List::~List ()         // destroy list: delete each Link individually
{
   while (head)
      Delete(head);
}
//-----------------------------------------------------------------
                          // add a new link after specified element
const Link *List::Insert (Link *prev, const char *val)
{
   Link *next = prev ? prev->next : head;
   Link *l = new Link (val, prev, next);
   if (!prev)      head = l;
   if (!next)      tail = l;
   return l;
}
//-----------------------------------------------------------------
void List::Delete (Link *l)     // remove a link; no-op if link NULL
{
   if (!l) return;
   if (head == l)      head = l->next;
   if (tail == l)      tail = l->prev;
   delete l;
}
//-----------------------------------------------------------------
void List::RandomOp ()     // insert (80%) or delete (20%) an element
{
   if (rand() % 5 != 0)
   {
      const maxStr = 64;
      char buf[maxStr+1];

      // generate a string of random length between
      // 1 and 64 bytes, with random fill value
      int len = rand() % maxStr + 1;
      memset(buf, rand() % 128, len);
      buf[len] = '\0';
      Insert(tail, buf);
   }
   else
      Delete(head);

}
//-----------------------------------------------------------------
                       // find the first element with a given value
const Link *List::Find (const char *val) const
{
   for (register Link *l = head;  l;  l = l->next)
      if (strcmp(l->value, val) == 0)
         break;
   return l;
}
//-----------------------------------------------------------------
void main(int argc, char *argv[])
{
   if (argc < 2)
   {
      printf("\nusage: lists <element-count>");
      return;
   }
   long count = atol(argv[1]);

   Timer t("Overall Application");
   List *list1 = new List, *list2 = new List, *list3 = new List,
      *list4 = new List, *list5 = new List;

   // test allocation speed
   // note that to properly benchmark allocations, we need
   // interspersed de-allocations since all allocators will
   // be fast with an empty free-list!
   {
      Timer t("Insertion");

      // generate and insert `count' strings of random lengths
      // and contents; occasionally delete to simulate a
      // dynamically shrinking and growing linked list
      while (count--)
      {
         // intersperse each operation so that elements are
         // chaotically distributed in memory if we do not
         // have multiple memory pools
         list1->RandomOp();
         list2->RandomOp();
         list3->RandomOp();
         list4->RandomOp();
         list5->RandomOp();
      }
   }
   // test locality: each list will touch five times as many
   // pages if allocations were from a single heap
   {
      Timer t("Search");

      list1->Find("not present");
      list2->Find("not present");
      list3->Find("not present");
      list4->Find("not present");
      list5->Find("not present");
   }
   // destructors for lists 1-5: test freeing speed
   {
      Timer t("Deletion");

      delete list1;
      delete list2;
      delete list3;
      delete list4;
      delete list5;
   }
}

[LISTING TWO]



//--------------------------------------------------------------------
// Memory management exerciser, version 2.        By Arthur Applegate.
// This version has changes for improved performance. Only ten lines
// of code needed to be changed from version shown in Listing 1. These
// changes are indicated with "***" in the comment line.
//--------------------------------------------------------------------

#include <smrtheap.hpp>       // *** include SmartHeap C++ header file

//--------------------------------------------------------------------
class Link
{           friend class List;
public:     // *** constructor receives memory pool parameter
            Link (MEM_POOL pool, const char *val, Link *p, Link *n);
           ~Link ();
private:    // *** overload new/delete to use fixed-size algorithm,
            // *** allocating from the owning List's memory pool
            void *operator new(size_t, MEM_POOL pool)
                                        { return MemAllocFS(pool); }
            void operator delete(void *mem)    { MemFreeFS(mem); }
            char *value;
            Link *prev;
            Link *next;
};
//--------------------------------------------------------------------
class List
{
public:     List();
           ~List();
            const Link *Insert(Link *prev, const char *val);
            void Delete(Link *remove);
            void RandomOp();

            const Link *Find(const char *val) const;
private:    Link *head;
            Link *tail;
            MEM_POOL pool;    // *** per-List memory pool data member
};
//--------------------------------------------------------------------
// *** create a list element
Link::Link(MEM_POOL pool, const char *val, Link *p, Link *n)
{
   // *** use placement syntax to allocate string
   // *** from owning List's memory pool
   value = new (pool) char[strlen(val)+1];

   strcpy(value, val);
   if ((prev = p) != NULL)      prev->next = this;
   if ((next = n) != NULL)      next->prev = this;
}
//--------------------------------------------------------------------
List::List()                                      // create a new list
{
   // *** initialize memory pool for Link's and strings
   pool = MemPoolInitFS(sizeof(Link), 0, 0);
   head = tail = NULL;
}
//--------------------------------------------------------------------
List::~List()                                        // destroy a list
{
   // *** no need to free individual links or values,
   // *** as freeing the pool frees all Links and string values
   // note that if properly implemented, freeing the memory pool
   // will not even _touch_ the pages that contain the individual
   // blocks -- so there is no swapping regardless of heap size
   MemPoolFree(pool);
}
//--------------------------------------------------------------------
                             // add a new link after specified element
const Link *List::Insert(Link *prev, const char *val)
{
   Link *next = prev ? prev->next : head;

   // *** use placement syntax to allocate Link
   // *** from current List's memory pool
   Link *l = new (pool) Link(pool, val, prev, next);

   if (!prev)      head = l;
   if (!next)      tail = l;
   return l;
}

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Rethinking Memory Management

Rethinking Memory Management

Portable techniques for high performance

Arthur D. Applegate

Why Memory Management Matters

malloc Woes

Solutions

Fixed-Size Allocators

Overloading operator new in C++

A Linked-List Example

Improving Memory Management

Performance Results

Alternative Implementations

Table 1: (a) Windows 3.1 benchmarks (20,000 iterations); (b) Windows NT benchmarks (40,000 iterations); (c) HP-UX benchmarks (150,000 iterations). (Note: All times are in seconds.)

[LISTING ONE]

[LISTING TWO]

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Database

Rethinking Memory Management

Rethinking Memory Management

Arthur D. Applegate

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content