Database

Algorithm Alley

Sasha Gontmakher and Ilan Horn

, January 01, 1999

Doing efficient memory allocation requires first understanding how real-world programs use memory. Sasha and Ilan discuss common patterns of memory usage, and use that to develop a practical memory manager.

Jan99: Algorithm Alley

Sasha can be contacted at [email protected], and Ilan at ilan@earthling .net, respectively.

Memory management is often taken for granted, but it can be critically important. Memory management affects performance: I've seen program speed literally double just by changing how memory was allocated. Memory management affects productivity: Many years ago, I switched from Pascal to C largely because C provided more convenient memory management. Memory management affects customer satisfaction: One reason for the recent upsurge of interest in garbage collection is that it reduces the severity of memory mismanagement by programmers. In addition to presenting an efficient memory-management scheme, Sasha and Ilan's contribution is a good example of how to build a fairly complex data structure tailored for a specific task.

-- Tim Kientzle

Sidebar: Online versus Offline Algorithms

There are many algorithms that satisfy sequences of requests from a client. Some are online: They process the requests one by one, without any knowledge of the future requests. Others are offline: They know the entire sequence of requests in advance, and can use this information to achieve better results. Obviously, offline algorithms can perform at least as well as online algorithms. The accompanying text box entitled "Online versus Offline Algorithms" examines the question of just how much better an offline algorithm can perform.

Efficient memory allocation lies somewhere in the middle. Although you do not know the complete sequence of future memory requests, you do know certain patterns that are common in real-world programs. In this article, we'll present a memory-management technique that works well for programs that follow these common patterns of memory usage.

Memory Usage Patterns

No single memory-allocation strategy is optimal for every program. The strategy we'll present assumes the following:

The programs free and reallocate objects frequently. A ray-tracing engine, for instance, casts a ray through a pixel, which then produces several more rays, each of which produces several more, and so on, recursively, until the rays become sufficiently weak. The result is that many ray objects get allocated, freed, and reallocated again for the next pixel.
The programs allocate lots of small objects. For example, 3D modelers represent polygons as linked lists of vertices, each of which is a 3D point. There will be many of these point objects, and they may be as small as 16 bytes each.
The programs use only a few distinct sizes of objects. This is most obvious in object-oriented programs, where each class dictates the size of its instances, and so the number of possible sizes is not more than the number of classes in the program.
It is not uncommon for a program to make heavy use of objects of one size in one phase, then free them and start using objects of a different size in the next phase.

Single-Size Allocators

A simple way to deal with programs that frequently free and reallocate objects is to keep a "free list" for each different size of object. You need an array in which each element points to a list of free blocks of the corresponding size.

When you go to allocate an object of size n, you look at the array element FreeObjects[n]; if it's not null, you return the first object on that list. When you free an object, simply add it to the corresponding list. This efficiently handles object reuse. Of course, if the free list is empty, you need to request more objects of that size from the system.

Most systems require that certain types be aligned at a 4- or even 8-byte boundary. For example, if the program requests a 3-byte object, you can allocate 4 bytes, since the last byte would be wasted anyway. This also helps improve the efficiency of the single-size allocator, because the free list of 4-byte objects can be used for any request of 1, 2, 3, or 4 bytes.

Pages

A typical implementation of malloc() adds a header to each block of allocated memory with information about that block. If your program is allocating lots of small objects, this overhead can significantly increase your memory requirements. One way to reduce this overhead is to allocate a single large page and split it into many smaller objects. Then you need only one header for the entire page to indicate the size of the objects in that page. When the program asks for more objects of that size, you return the next unallocated block of memory from the appropriate page.

Using a page as the basis for memory allocation gives you the organization in Figure 1. Here, each page has its own list of free objects, as well as a pointer to the next unallocated object. Using this scheme, here's how you allocate an object of size n:

Get a pointer from Pages[n] that points to a linked list of pages that all hold this size object.
Grab the first page from the list.
If the free list on that page isn't null, return the first item in the free list.
If the free list is null, return the next unallocated object.

If there aren't any pages with free objects of this size (for example, if Pages[n] is null), then you need to grab a new page for this size of object, initialize it, and add it to the list of pages.

Because there are only a few different object sizes, allocating pages for each size of object is efficient. The pages will tend to be almost full, providing a high degree of memory utilization. This approach also prevents memory fragmentation, which is the second source of memory usage inefficiency in traditional allocators.

Mapping Objects to Pages

To free an object, you must be able to convert a pointer to an object into the address of the page holding that object. If all of your pages are the same size, you can allocate pages so that their starting address is a multiple of that size. Then, given the address of an object, you round it down to the nearest page boundary, and that's where the size information resides. If your page size is a power of two, this rounding is just a fast bitwise AND.

Some operating systems provide a means of allocating memory exactly on page boundaries, but this isn't portable. On any system, however, you can allocate several consecutive pages at once in one metapage, round the resulting address up to a page boundary and throw out the remains of the pages on the edges of the metapage. If the number of pages in a metapage is sufficiently high, the overhead of throwing away one page is negligible.

Reusing Pages

The algorithm just outlined takes into account the issues mentioned at the beginning of this article. That leaves the last issue -- programs can allocate lots of objects of one size, then throw them out and start using other objects that cannot be allocated in the cells left from the previous phase.

However, if you knew when a page would be completely free, then you would be able to reuse it for allocating objects of another size. You can do this by storing a count of allocated objects with each page, updating it each time an object is allocated or freed.

Using this count, you can identify three kinds of pages -- completely full ones, partially full ones, and empty ones. For each object size, we keep a doubly linked list of partially full pages. When an object is freed from a full page, that page is added to the list. Similarly, when a page becomes full, it's removed from this list.

Important Optimizations

If you keep a linked list of free objects of each size, then that list will contain objects from many different pages. To change the size of objects stored on a page, you would first need to remove all of its objects from the free list, which would be time-consuming. Instead, by keeping a linked list for each page, you can change the object size of a page by simply editing that page.

There is a single global list of empty pages, which introduces a problem. Suppose a program allocates a single object of a particular size, then frees it, allocates it again, and so on. Each time that object is allocated, a page will be taken from the empty page list and initialized. When that object is free, the page is empty and is put back on the empty page list.

This is reasonable if pages can be quickly initialized. A simple implementation might build a complete list of free objects when the page was initialized, but that can be time-consuming for large pages holding small objects. Instead, we keep a pointer to the first unallocated object on the page and a separate free list pointer.

Large Objects

The algorithm just described assumes that all objects will be smaller than the page size. Although large objects are rare in most programs, they do occur.

To handle such allocations, you can delegate the work to the system's memory allocator. When an object is freed, you need to know whether the object is a small object (add it to the page free list) or a large object (return it to the system memory allocator). If you allocate large objects so they start on a page boundary, you can store this information in the page header. That way, regardless of the type of object, the first step is to find the page start and read the page header.

This requires allocating the requested size plus a page size, rounding up to a page boundary and storing the pointer to the original memory somewhere in the page header. This pointer will be used when that block is freed.

The C++ Implementation

Listing One is the C++ implementation of these concepts. It redefines the global operators new and delete. Operator new is given a size and returns a pointer to raw memory. Operator delete receives a pointer to raw memory, one that was previously returned by new.

Since you're redefining the global new operator, you need some other way to request memory from the system. The malloc() and free() functions from the Standard C Library are appropriate. Because C++ doesn't let you mix new with free() or malloc() with delete, you can be certain that your redefined new and delete will be used consistently.

The code has minimal checking of malloc() success. To be fully compliant with C++, you must write more elaborate handling (calling new_handler() and throwing bad_alloc). C++ restricts arithmetic on pointers. In particular, you can't do a bitwise AND on a pointer. A better solution is to use ptrdiff_t, which is guaranteed to be the same size as the pointer. You need to cast the pointer to ptrdiff_t, perform the calculations, and cast the value back to the pointer. This may seem like bad style, but there's no real alternative.

To store singly linked lists, we define a structure called Header. If you have a pointer to memory where you know a Header is stored, you cast it to Header* and then you get access to the information stored there. When you unlink a header from its list, you just cast it back to void* and hand it to users as raw memory. Similarly, a structure called PageHeader is used to get access to headers of pages.

C++ cannot statically initialize doubly linked lists. Initially, the headers should have their next and previous pointers point back at the headers, but this can't be done with C++ static initializers. Instead, we initialize these pointers to 0 at compile time and include checks in strategic locations to fix these pointers at run time.

Optimization

Optimization is important for this kind of code. For example, we store the heads of linked lists not as pointers, but as Header and PageHeader structures. This eliminates the ifs in the linked-list code concerned with possible zero pointers.

The code has several parameters that you can alter to tune the algorithm for your application. The granularity is usually four on 32-bit machines or eight on 64-bit machines; all requests are rounded up to a multiple of this value. The other two parameters are the page size and the metapage size.

Usually, the page size is a power of two that is several times larger than the largest object that will be heavily allocated. Don't worry about very large objects that are only created occasionally. Generally, 4KB or 8-KB pages work well.

Larger metapages waste less memory because of wasted pages at the metapage boundary. We suggest setting the metapage size to several hundred pages, thus wasting no more than 1 percent of memory. It may seem that this would be a problem with programs that don't normally use much memory. However, we recently experimented with these techniques and obtained good results, even for programs with small footprints.

The complete code is available electronically (see "Resource Center," page 5). We've also included a test program that you can use to compare our allocator to your system allocator. To use the code, simply link it into your C++ program.

Acknowledgment

We would like to thank Vitaly Surazhsky for the initial inspiration for this technique.

DDJ

Listing One

// smartnew.cpp

</p>
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>


</p>
//  Policy configuration
const size_t granularity_bits = 2;
const size_t granularity = 1 << granularity_bits;
const size_t page_size_bits = 12;
const size_t page_size = 1 << page_size_bits;
const size_t metapage_size_bits = 8;
const size_t metapage_size = 1 << metapage_size_bits;


</p>
size_t handled_obj_size(size_t page_free_size)
{
    return page_free_size;
}
//  Utility methods
template<class T>
inline T align_up(T val, size_t alignment)
{
    return (val + alignment-1) & ~(alignment-1);
}
template<class T>
inline T align_down(T ptr, size_t alignment)
{
    return ptr & ~(alignment-1);
}
struct Header {
    Header* next;
    bool is_empty() const { return next == 0; }
    Header* deque() {
        Header* result = next;
        next = result->next;
        return result;
    }
    void enque(Header* obj) {
        obj->next = next;
        next = obj;
    }
};
struct Page {
    // Linked list handling
    Page* prev;
    Page* next;
    bool is_empty() { return next == this; }
    void link(Page* page) {
        page->next = next;
        page->prev = this;
        next->prev = page;
        this->next = page;
    }
    void unlink() {
        next->prev = prev;
        prev->next = next;
    }
    void check_init() {
        if (prev == 0) prev = next = this;
    }
    // Object allocation handling
    size_t alloc_size;
    size_t alloc_count;
    Header free_list;
    void* unallocated;
    bool is_page_full() const {
        return free_list.is_empty() && unallocated == 0;
    }
    bool is_page_empty() const {
        return alloc_count == 0;
    }
    void* allocate() {
        alloc_count++;
        if (!free_list.is_empty())
            return (void*)free_list.deque();
        void* obj = unallocated;
        unallocated = get_next_obj(unallocated);
        if ((size_t)unallocated > (size_t)get_last_obj())
            unallocated = 0;
        return obj;
    }
    void free(void* obj) {
        alloc_count--;
        free_list.enque((Header*)obj);
    }
    void designate(size_t size) {
        alloc_size = size;
        alloc_count = 0;


</p>
        free_list.next = 0;
        unallocated = get_first_obj();
    }
    //  Big allocation
    bool is_big_alloc() {
        return alloc_size == 0;
    }
    void* big_alloc(void* place) {
        alloc_size = 0;
        unallocated = place;
        return get_first_obj();
    }
    void* big_free() {
        return unallocated;
    }
private:
    void* get_first_obj() const {
        return (void*)align_up(sizeof(Page) + (ptrdiff_t)this, granularity);
    }
    void* get_last_obj() const {
        return (void*)align_down(page_size - alloc_size + 
                                               (ptrdiff_t)this, granularity);
    }
    void* get_next_obj(void* obj) const {
        return (void*)(alloc_size + (size_t)obj);
    }
};
struct PagePool {
    static Header free_list;
public:
    static Page* allocate() {
        if (free_list.is_empty()) allocate_metapage();
        Header* page = free_list.deque();
        return (Page*)page;
    }
    static void free(Page* page) {
        free_list.enque((Header*)page);
    }
private:
    static void allocate_metapage() {
        size_t num_pages = metapage_size;
        void* metapage = malloc(num_pages * page_size);
        assert(metapage != 0);
        void* aligned_metapage = 
                          (void*)align_up((ptrdiff_t)metapage, page_size);
        if (metapage != aligned_metapage)
            num_pages -= 1;        
        void* curr_page = aligned_metapage;
        for (size_t i=0; i<num_pages; ++i) {
            free_list.enque((Header*)curr_page);
            curr_page = (void*)((size_t)curr_page + page_size);
        }
    }
};
Header PagePool::free_list = {0};


</p>
const size_t num_sizes = page_size / granularity;
const size_t page_free_space = page_size - sizeof(Page);


</p>
Page page_lists[num_sizes];
inline size_t free_list_index(size_t val, size_t alignment)
{
    return val / alignment;
}
void* operator new(size_t size)
{
    if (size < 0) return 0;
    if (size == 0) size = 1;
    size = align_up(size, granularity);


</p>
    if (size >= handled_obj_size(page_free_space)) {
        // This is a "big" allocation
        void* place = malloc(size+page_size+sizeof(Page));
        assert(place != 0);
        Page* page = (Page*)align_up((ptrdiff_t)place, page_size);
        return page->big_alloc(place);
    }
    Page& header = page_lists[ free_list_index(size, granularity) ];
    header.check_init();
    if (header.is_empty()) {
        Page* free_page = PagePool::allocate();
        header.link(free_page);
        free_page->designate(size);
    }
    Page* free_page = header.next;
    void* obj = free_page->allocate();
    if (free_page->is_page_full()) {
        free_page->unlink();
    }
    return obj;
}
void operator delete(void* ptr)
{
    if (ptr == 0) return;
    Page* page = (Page*)align_down((ptrdiff_t)ptr, page_size);
    if (page->is_big_alloc()) {
        void* place = page->big_free();
        free(place);
        return;
    }
    if (page->is_page_full()) {
        page->free(ptr);
        if (page->is_page_empty())
            PagePool::free(page);
        else {
            Page& header = 
                 page_lists[ free_list_index(page->alloc_size,granularity) ];
            header.link(page);
        }
        return;
    }
    else {
        page->free(ptr);
        if (page->is_page_empty()) {
            page->unlink();
            PagePool::free(page);
        }
    }
}

Back to Article

1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Algorithm Alley

Sasha Gontmakher and Ilan Horn

Memory Usage Patterns

Single-Size Allocators

Pages

Mapping Objects to Pages

Reusing Pages

Important Optimizations

Large Objects

The C++ Implementation

Optimization

Acknowledgment

Listing One

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Database

Algorithm Alley

Sasha Gontmakher and Ilan Horn

Memory Usage Patterns

Single-Size Allocators

Pages

Mapping Objects to Pages

Reusing Pages

Important Optimizations

Large Objects

The C++ Implementation

Optimization

Acknowledgment

Listing One

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content