Channels ▼
RSS

Using a Typelist to Create a Flexible Compound Sorting Object


Using a Typelist to Create a Flexible Compound Sorting Object

This paper describes the class order_by that performs a compound sort on a container of objects according to one or more attributes. To provide flexibility, I'll use a typelist to represent the desired sort order. This technique allows the user to supply multiple comparison-function objects to specify the compound sort order while retaining compile-time type checking.

Justification for The Class order_by

The class is called order_by because it provides the same functionality as the SQL ORDER BY clause. For example, a PERSON table may exist that contains the columns "FIRST," "LAST," "MIDDLE," "STATE," "ZIP," and "ID." A user can return the contents of the table sorted in a number of different ways by using the ORDER BY clause.

SELECT * FROM PERSON 
  ORDER BY ID
SELECT * FROM PERSON 
  ORDER BY LAST, FIRST, ID
SELECT * FROM PERSON 
  ORDER BY STATE, ZIP, LAST, FIRST, ID
This last example retrieves the data such that it is sorted by state, then each record with the same state is sorted by zip, each with the same state and zip is sorted by last name, and so on. This is known as a "compound sort."

Retrieving the data set presorted from the database works fine in many cases, but not when the same data needs to be resorted multiple times. In the case where many clients need to receive the same data set in multiple unique formats, it becomes necessary to perform the same query multiple times just to get the various sort orders.

The class order_by provides this functionality for containers of objects where the concept of "less than" applies to multiple attributes. For example, a person object exists that contains the data members first, last, middle, state, zip, and id. A set of less_than function objects also exists, each taking two arguments of type person const & and each examining one particular data attribute. (Refer to Listing 1 for a contrived example.)

The order_by constructor can receive two random access iterators into a container of person objects and sort the collection in any desired order (see examples above).Using this class, a data set can be retrieved in only one call to the database and then presented in many different ways.

An alternate approach to using the order_by object is to use a set of function objects that perform the less_than operation on multiple attributes at once. For example, each of the following function objects may be written for a class with three attributes: first, last, id.

less_first    less_first_last    less_first_last_id
less_last     less_last_first    less_first_id_last
less_ssn      less_first_id      less_last_first_id
              less_id_first      less_last_id_first
              less_last_id       less_id_first_last
              less_id_last       less_id_last_first
Any of these compound-comparison function objects can be passed as a third argument to the standard sort function, resulting in the desired order. (Listing 2 shows an implementation of struct less_last_first_id and its usage.) However, there are two problems with this approach:

  • The number of function objects required to provide all compound sorting permutations;
  • The dependencies of the various components in the project.

Function Object Permutations

In order to cover all possible compound-sorting permutations in this class, 15 function objects (3! + (3!/1!) + (3!/2!)) are required. A class with four attributes requires 64 function objects (4! + (4!/1!) + (4!/2!) + (4!/3!)). The number of comparison types becomes unmanageable very quickly.

Component Dependencies

Function objects that compare the attributes of a certain class belong to the interface of that class. As part of the interface, they belong in the same translation unit as the class definition. (The function object definitions are in the same header as the class definition.)

One obvious motivation for this rule is that the comparison function objects can take advantage of a friend declaration to save at least two calls to public member access functions (see the text box titled "Friendship"). But more important than the efficiency issue is the presentation of a single coherent interface to the class. A function that compares two objects of a class by one specific attribute must model a unique concept. By not placing the comparison function objects in the same translation unit, you are allowing a user to define their own varieties and possibly cause the class to behave in unpredictable ways.

Since the comparison function objects must reside in the same header as the class definition, any modification to this file causes all translation units that depend on this class to be recompiled and retested. It is unlikely that the designer of the class is going to know all of the sorting permutations that future users are ever going to require, and it is equally unrealistic for the class designer to include all possible compound-sorting permutations in the file. Therefore, the project using standard sort with compound function objects (Listing 2) has established a fragile dependency by placing a component that is likely to change above more stable components in the dependency graph. [2]

The order_by object allows a class with multiple comparable attributes to define only the comparison function objects that correspond to these attributes. This decouples the class from the sorting requirements of future users.

Implementation

Typelist [3] is key to the design of class order_by because there is simply no other mechanism to cope with an unknown number of unknown types at compile time. Just as an ellipsis is used in a function declaration to indicate that you do not know the number or even the type of all of the arguments, a typelist is used as a template parameter when you do not know the number or type of the template parameters. The distinction between these two methods is that a Typelist still allows for compile-time type checking.

A typelist is a template struct containing two typedefs: Head and Tail (Listing 3). A typelist does not have any data members or member functions. Its only purpose is to provide the alternate type names for the two template parameters. Since a typelist is itself a type, the Tail type of a typelist can be another typelist (Figures 1a, 1b, and 1c). In this way, a typelist is chained together like a compile-time singly linked list of types. An empty struct NullType is used to indicate the end of a list.

The class order_by provides a mechanism to apply the comparison function objects in a typelist to subsequent subsets of the data to achieve the desired compound sort. (Refer to the order_by.h in Listing 4.)

The order_by object gets the type of the container holding the objects and the typelist of comparison-function objects as template parameters. The length of the typelist is determined using the Typelist::Length [3] template construct and is stored in the enum n. Since this occurs at compile time, n can be used in turn as a template parameter for the next instantiation of class order_by. The type of the comparison function at the current offset in the typelist is typedef'd to comp_type using the Typelist::TypeAt [3] template construct. This is used to call the standard sort function on the range specified by the constructor arguments. The comp_type is also used with the standard equal_range function to determine the subsets of the container (differentiated by the attribute associated with comp_type).

(Again in English.) The order_by function first sorts the entire container using the first comparison function object. The code then enters a while loop where the container is broken up into subsets of equal items based on this comparison type. A new order_by object is created receiving the subset of items and the index of the next function object. This order_by object knows nothing of the previous object and thinks that it has received the entire container and the first comparison function object. This occurs in classic "recursive style" until there are no more function objects by which to sort, and all of the equal ranges in the outermost container are sorted.

The instantiation of the subsequent order_by objects uses the expression ((i+1==n)?(0xFFFF):(i+1)) as the third integral template parameter. This indicates that the next typelist index is to be used unless it would be past the end of the typelist, in which case, a marker value is substituted. This is possible because i and n are compile-time constants and operator ?:, unlike an if-then-else statement, is also evaluated at compile time. The marker integer is used as the partial specialization of the order_by class template to end the template recursion.

I refer to the cascading function calls as "recursive style" out of convenience only. Although one function calling itself looks like classic recursion, this case is different because it is a constructor. The only true recursion that is taking place is in the compile-time recursive template instantiation of the various versions of the order_by class (one distinct order_by object for each item in the Typelist). The calls appear to be recursive in the code, but in reality, they are simply calls made between the template instantiations of the order_by objects.

The effect of instantiating an order_by object with a typelist containing four types is to create four order_by objects that call each other's constructors in the order determined by the typelist.

Type Checking

Although equal_range only requires forward iterators, sort requires random access iterators. Any attempt to create an order_by object with a container type that does not support random access iterators results in a compile-time error. Any attempt to place a function object into the typelist that does not have the bool operator()(T const &, T const &) const signature also results in a compile error. (This is the advantage of a typelist solution: If it compiles, it works.)

Runtime Implications of Using a Typelist Solution

It is not possible to create a new type at run time. Therefore, all of the order_by objects created for a specific typelist are created at compile time. The implication is that all sorting permutations that need to exist at run time need to exist in the code.

If a separate typelist needs to be created for each compound-sorting permutation, why bother using a typelist solution at all? The answer is in the dependency relationships. The code that defines the various typelists may need to be modified each time a new compound sort is required by a user, but this code does not have to reside in the file containing the comparison function objects and class definition. This code can be isolated from the other modules, and, therefore, the source of instability is isolated from the highly depended-upon components.

Figure 2a shows the dependency graph of a project using a sort_logic component. The person.h in this project contains multiple-compound comparison function objects and uses the standard sort function to perform the sorting (as shown in Listing 2). The sort_logic component collects the runtime information and calls the sort function, depending on which type of sorting is required. (This may occur by reading a list of client options from a database, getting a user's preferences from a cookie, and so on.) The person object is fairly ubiquitous and many components in the project depend on it. In this case, when a user of the sort_logic component needs to account for a new type of sorting, the person.h file must be modified. This necessitates the recompilation and retesting of all dependent components (A, B, and C).

Figure 2b shows the dependency graph of a project that uses the order_by object for compound sorting. A new component person_sel has been added that contains the known sorting requirements (Please refer to Listing 5). In each case block, the construction of object o sorts the container according to the Typelist provided as its second template parameter. The sort_logic component works as it did before: collecting the runtime information and calling the proper sort. However, in this case, the person component only contains one comparison-function object per attribute and has no knowledge of any of the sorting requirements. When a new compound sort is needed, only the person_sel.h file is modified and all of the other components that have included the person.h do not know or care.

In this way, the characteristic of the project that is most likely to change is isolated from the component that needs to be the most stable.

Summary

Is all of this work worth the effort? Of course it is. If you are defining a class you should only be concerned with creating a complete, coherent object, not with trying to predict downstream business rules. The class person in this case only cares that each of its attributes can be used to order a collection of its objects. The class order_by exists so that classes such as person can exist in blissful ignorance of how they are being used. The insulation of highly depended-upon components from highly unstable components helps to create a project that painlessly allows for the inevitable changes.

The use of typelist allows for the best of both worlds: the flexibility of using unknown numbers of unknown types and the desirability of compile-time type checking.

References

[1] John Lakos. Large-Scale C++ Software Design. (Addison-Wesley, 1996).

[2] R.C. Martin, "Large-scale Stability," C++ Report, 9(2), pp. 54-60, February 1997.

[3] Andrei Alexandrescu. Modern C++ Design. (Addison-Wesley, 2001).

About the Author

Thomas Bergin is currently working at the Kansas City, MO office of the National Insurance Producer Registry (Affiliate of the National Association of Insurance Commissioners).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video