Beyond B-Trees

Of all the indexes that can order records in database-management systems, only B-Trees indexes are offered universally.


November 07, 2008
URL:http://www.drdobbs.com/database/beyond-b-trees/212001281

Konstantin is a software engineer at McObject. He can be reached at [email protected].


Of all the indexes that can order records in database-management systems, only B-Tree indexes are offered universally. Often, they are the only index offered. Of course, there's no denying B-Trees' efficiency for basic database search operations such as exact match, prefix, and range searches. But these "vanilla" indexes are a poor fit for certain data and access patterns. For purposes as varied as IP routing, geospatial searching, and soundex algorithm development, less-common indexes can be dramatically more efficient. The scenarios I present here focus on a pair of distinctive index types—the R-Tree and a "user-defined" index using the extremeDB database from McObject (www.mcobject.com).

Spatial Index

Today, many applications and services need efficient algorithms to perform spatial searches—to locate the nearest object to a current location, find all objects in the user's vicinity, and so on. But B-Tree indexes are one-dimensional and cannot deal with the R2 or R3 coordinates inherent in spatial searches.

The R-Tree algorithm (sometimes called "Guttman's R-Tree") does the job well by mapping objects in space using a "wrapping rectangle." If an object is represented by point coordinates (X,Y), then the wrapping rectangle is a degenerated rectangle in which the width and height are zero. For all other geographical objects (lines, polygons, arbitrary shapes), the wrapping rectangle is such that the coordinates of the top-left corner are smaller than or equal to the coordinates of any point of the object, and the coordinates of the bottom-right corner are greater than or equal to the coordinates of any point of the object. In other words, a wrapping rectangle is the smallest rectangle that fully contains the specified object.

A spatial object could be defined as follows in an eXtremeDB schema definition file:

/* structure representing point on the map */
struct Point
{
   double latitude;
   double longitude;
};
class Street {
   /* vector of points specifying the street */
   vector<Point> points;
   /* Street wrapping rectangle */
   rect<double>  wrap_rect;
   rtree<wrap_rect> streets_idx;
};

Adding a new street requires the application to store the street's coordinates and calculate its wrapping rectangle:


Street street;
mco_rect_t wr;
wr.l.x = DOUBLE_MAX;
wr.l.y = DOUBLE_MAX;
wr.r.x = DOUBLE_MIN;
wr.r.y = DOUBLE_MIN;
Street_new(trans, &street);
Street_points_alloc(&street, n_points);
for (i = 0; i < n_points; i++) 
{
    if (points[i].latitude < wr.l.latitude) {
        wr.l.latitude = points[i].latitude;
    }
    if (points[i].longitude < wr.l.longitude) {
        wr.l.longitude = points[i].longitude;
    }
    if (points[i].latitude > wr.r.latitude) {
        wr.r.latitude = points[i].latitude;
    }
    if (points[i].longitude > wr.r.longitude) {
        wr.r.longitude = points[i].longitude;
    }
    Street_points_put(&street, i, &points[i]);
}
Street_wrap_rect_put(&street, &wr);


For instance, assume a user is searching for a location using mapping software. The application must present the result (streets) in a window that corresponds to a map rectangle with coordinates min_longitude, min_latitude, max_longitude, max_latitude.


mco_rect_t r;
mco_cursor_t c;
MCO_RET rc;
r.l.x = min_longitude;
r.l.y = min_latitude;
r.r.x = max_longitude;
r.r.y = max_latitude;
if (Street_streets_idx_search(trans, MCO_EQ, &c, (double*)&r) == MCO_S_OK)
{
    for (; rc == MCO_S_OK; rc = mco_cursor_next(trans, &c))
    {
        Street street;
        Street_from_cursor(trans, &c, &street);
        // display it
    }
}


User-Defined Indexes

eXtremeDB's user-defined index is not an index in its own right; rather, it extends a B-Tree to accept the user's specified comparison function. This is needed because applications use different rules for sorting string values. For example, an application might want to ignore a character's case, or compare values using the rules of a particular language.

By default, eXtremeDB uses "raw comparison" by comparing strings as sequences of bytes. Consider the class Person containing first and last names of the person:


class Person {
    string first_name;    
    string last_name;
};


If a case-insensitive index for Person's name is desired, it can be declared in the eXtremeDB schema description file using the userdef keyword:


class Person {
    string first_name;    
    string last_name;
    userdef tree <last_name,first_name> name_index;
};


Next, the comparison function is defined in a C file. This actually requires two functions—one to compare two objects, and another to compare a given key value to the index field value(s) of an object. The eXtremeDB schema compiler generates function prototypes—programmers must only provide function bodies:


/* object-to-object user-defined compare function */
int2 Person_name_index_compare_obj(Person* handle1, Person* handle2)
{
    char buf[2][MAX_NAME_LEN];
    uint2 len;
    int diff;
    /* extract last name component from objects*/
    Person_last_name_get(handle1, buf[0], MAX_NAME_LEN, &len);
    Person_last_name_get(handle2, buf[1], 
      MAX_NAME_LEN, &len);
    /* compare last names */
    diff = stricmp(buf[0], buf[1]);
    if (diff != 0) { /* if them not equal, return 
                       difference */
        return diff;
    }
    /* extract first names */
    Person_first_name_get(handle1, buf[0], 
      MAX_NAME_LEN, &len);
    Person_first_name_get(handle2, buf[1], 
      MAX_NAME_LEN, &len);
    /* compare first names */
    return stricmp(buf[0], buf[1]);
}
/* object-to-key user-defined compare function */
int2  Person_name_index__compare_ext(Person* handle, 
      void** key)
{
    char buf[MAX_NAME_LEN];
    uint2 len;
    int diff;
    /* extract last name component from the object */
    Person_last_name_get(handle, buf, MAX_NAME_LEN, 
      &len);
    /* compare last names */
    diff = stricmp(buf[0], 
      Person_name_index_extkey_last_name(key));
    if (diff != 0) { 
        return diff;
    }
    /* extract first names */
    Person_first_name_get(handle, buf, MAX_NAME_LEN, 
      &len);
    /* compare first names */
    return stricmp(buf, 
      Person_name_index_extkey_first_name(key));
}

The user-defined function must be registered in the database before use. Another valuable application of user-defined functions is an implementation of the soundex algorithm to search based on the sound of a word rather than its exact spelling—"wear," "where," and "ware," for instance.

Conclusion

Knowing something about specialized indexes enables faster development, more efficient code, and the ability to work with more complex data structures. Other "nontraditional" indexes worth exploring include the Patrice Trie (www.ddj.com/architect/208800854), KD-Trees (another spatial index), T-Trees for in-memory data access and storage, and Hash tables for quickly locating a single unique index entry.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.