Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Range Tracking & Comparison Algorithms


February, 2006: Range Tracking & Comparison Algorithms

Kirk is a software developer working in the Rational Software division of IBM. He can be contacted at [email protected].


Some information is best viewed as a list of ranges. Most software developers are familiar with the concept of virtual address ranges occupied by memory heaps, stacks, executable code modules, and the like. Application programs that need awareness of such ranges might store a range list that gets updated over time; for example, when a heap shrinks or grows. Similar range lists could also be used in other applications to represent sections of an image in a motion picture, or age groups of participants in a long-term study, or layers in a collection of sediment samples, or candidate choices at exit polls across the country, and so on. Range-based data is commonly used by accounting, vision, storage, operating-system software, and elsewhere.

In any of these examples, a range might represent a contiguous set of items, such as the depth of certain strata in a sediment sample, or opinions of voters between 25 and 35 years old. Each range in the list might be represented as a pair of numbers—one number that represents the base of the range and another number that represents that range's size or limit. To keep a list of ranges up to date and useful, you need algorithms for the creation, modification, comparison, and deletion of the numeral pairs that represent the ranges in the list.

Range-list comparison comes in handy when the ranges can change in size. For example, suppose a range list is used to represent heap memory in a virtual address space, where ranges of heap memory are listed as a series of base/limit pairs. Each base represents the bottom address of a range of contiguous virtual address space that the operating system has committed for use by that heap. Each limit represents the top of such a range. As the heap grows and/or shrinks, a new list of base/limit pairs is generated for comparison with the existing list. A diff comparison of an existing list and a new, updated one then selects the areas of heap growth and shrinkage, possibly for use in memory-state tracking or similar purposes.

Building and Managing Lists

Range-list creation and destruction are implemented via a pair of algorithms (Listings One and Two) that assume that the ranges to be tracked reside in the virtual address space for a process. You might need to make some minor implementation changes to apply these algorithms to other specific types of range data. To use these algorithms, you should provide code to pinpoint the appropriate beginning and ending points of an item's range, at lines in the pseudocode that are identified via relevant text. You should also provide code that allocates, initializes, and deallocates the range-tracking elements themselves, also at points shown as text in the listings.

A notable feature of the range-list creation algorithm is the coalescing of ranges for contiguous items. Multiple objects that occupy contiguous virtual address space are represented by a single range-tracking element. This is done to achieve optimal performance when these range lists are accessed in situations such as heap-range tracking where the boundaries between contiguous items aren't important. If you need to keep track of boundaries between separate ranges even when they are contiguous, you can simplify the algorithm so that it doesn't coalesce contiguous ranges.

A part of a range list can be replaced by splicing in a new set of ranges using the Splice Range List algorithm (Listing Three, available electronically; see "Resource Center, page 6). This saves you from having to replace an entire list when updates are needed for just part of the list. This pseudocode assumes that the ranges to be tracked reside in the virtual address space for a process. With some minor implementation changes, the algorithm could be applied to other types of range data. You should provide code that accesses the appropriate list elements wherever this pseudocode contains statements such as "look up the next range on the new list." You should also provide code that allocates, initializes, and deallocates the range-tracking elements themselves, at the element insertion and deletion points specified in this pseudocode. The startAddr and stopAddr input parameters denote virtual addresses in the context of this pseudocode. More generically, these values correspond to beginning and ending delimiters that apply to both the old and new range lists. Old list elements corresponding to ranges that do not reside between these delimiters are left in place. New list elements effectively replace the old ones between these delimiters. In some conditions, where an old range spans one of these delimiters, an old list element may be modified to reflect the changes represented via the new list. The pseudocode provides special case logic to address these conditions.

Comparing lists of base/limit pairs requires a more complex algorithm than one might first suspect. Before I developed such an algorithm, a coworker and I first searched for any comparable base/limit, list-wise diff routine. We searched through textbooks and web pages showing algorithms used in computer vision and DNA sequencing. Because we could find no such algorithm, I wrote my own. This algorithm is suitable for any situation where lists of ranges represented by base/limit pairs need to be compared. In the pseudocode for the range-list comparison algorithm (Listing Four; available electronically), I've chosen to represent the two lists as "old" and "new" because diff algorithms are probably most often used to determine which elements of a dataset have been updated between two points in time.

Figure 1 shows the kind of information supplied by the range-list comparison algorithm. Given an old and new list, this algorithm indicates which subsets of the list have changed, including whether the changes represent data specific to the old list or the new list. A notable aspect of the algorithm is the number of Boolean variables needed to control the passage through the old and new lists. It is coded this way largely for "debugability" because the state information at any point in the algorithm is rather complex. There is also a performance consideration: This algorithm frequently needs to know whether there is a current and next element on each list, and to repeatedly dig into the list elements, as that information could be costly. Another performance optimization is that the number of comparisons of list elements themselves has been minimized. For each pair of list elements, the algorithm checks for range overlap, for growth or shrinkage at the bottom of the range, and for growth or shrinkage at the top of the range—and only then does it bump forward to do the same thing with the next element(s) of one or both lists.

As with the splicing algorithm, to implement range-list comparison, you should provide code that accesses the appropriate list elements wherever this pseudocode contains statements such as "look up the next new range." The diff results are collected at points designated "tell the outside world <whatever>." You could add function calls at these points to update appropriate tracking elements in other lists, and so on. Alternatively, you could collect elements of a third list at these points and return a reference to those results when the diff is complete.

A potentially time-saving feature of the range-list comparison algorithm is that, like the splicing algorithm, it lets you specify the boundaries applied to the beginning and ending points of the list. Suppose you're interested in comparing only the ranges of objects that reside within a portion of a heap occupying a subset of the virtual address space. You can specify starting and stopping addresses to obtain a list applicable to the interesting portion of virtual memory. The algorithm ignores any objects that lie outside of these specified boundaries, which you specify as input parameters. No matter what kind of ranges you want to compare, if you want to limit the comparison to part of a larger list, you can apply these parameters to limit the set of ranges that will be compared, to save time.

DDJ



Listing One

/* The Create Range List algorithm                                   */
/* Tracks a current set of address ranges for the objects that meet    */
/* specific criteria. Coalesces the ranges of a contiguous set of      */
/* these objects into a single range list element, minimizing the      */
/* list's size and optimizing subsequent range list diff performance   */
/* Returns the list of ranges that are identified and tracked here.    */

/* Input parameters */
ADDRESS startAddr
ADDRESS stopAddr

/* Local variables */
LIST    a new list of ranges
BOOL    bListRange = FALSE

/* All other variables are references to address range tracking      */
/* list  elements.  At minimum, there will be a references to the    */
/* "current" element in the new list being created.                 */

Allocate and initialize the header for the new list of ranges

IF (startAddr) DO
    Determine the first object that resides at or above startAddr
    Make that object "current"
    END
ELSE DO
    Determine the first object that can be found
    Make that object "current"
    END

/* Walk through the address space, building the list of address      */
/* that meet a user-defined set of selection criteria                */
WHILE ((the address of the "current" object) < stopAddr) DO
    IF (the "current" object meets range listability selection criteria)
        bListRange = TRUE
      
    IF (bListRange) DO
        IF (there is a "current" range tracking element) DO
            /* Add up the sizes of any contiguous objects that       */
            /* meet the selection criteria for our list.             */
            Add the "current" object's size to the size of the 
                  "current" range tracking element
            END
        ELSE DO
            /* This is the first listworthy object we've             */
            /* encountered lately.                                  */
            /*** A GOOD BREAKPOINT ***/
            Allocate a range tracking element and make it "current"
            Set the range base to the base of the "current" object
            Set the range size to the size of the "current" object
            END
        END
    ELSE
        bListRange = FALSE
    
IF (there is a "current" range tracking element) DO
        /* Did we just walk past the "upper" end of a set of         */
        /* contiguous objects to be listed?                         */
        IF (!bListRange) DO
            /*** A GOOD BREAKPOINT ***/
            Add the "current" range element to the range list
            Update local state so that there is no "current" range 
                  element
            END
        END

        find the object at the next "higher" address
        make it "current"
END /* WHILE loop */

IF (there is a "current" range tracking element) DO
    /* Clean up loose ends. */
    Add the "current" range element to the range list
END

RETURN the new list of ranges
Back to article


Listing Two
/* The Destroy Range List algorithm                                  */
/* Frees all the memory associated with an outdated list of           */
/* contiguous address ranges.                                       */

/* Input parameter */
LIST    a list of ranges

/* All other variables are references to address range tracking       */
/* list  elements.  These include references to the "current" and     */
/* "next" elements in the list being destroyed.                      */

look up the first range on the old list and make it "current"

WHILE (there is a "current" list element) DO
    look up the next range on the old list and make it "next"
    delete the "current" list element
    make the "next" element "current"
    END

free the list header
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.