Dr. Dobb's | Ada-style Ranged Types in C++

Ada-style Ranged Types in C++

Ranged types, common in Ada, allow the programmer to constrain a numerical type to a certain range. Rich presents a library that brings these types to C++.

July 13, 2007
URL:http://www.drdobbs.com/tools/ada-style-ranged-types-in-c/201001318

Professionally, I develop primarily in Ada. When I do get a chance to program in C++, I miss some of the features that Ada provides. Ranged types is one of these features. Ada allows the programmer to constrain a numerical type to a particular range. Intermediate expression results are allowed to exceed the range, but an exception is raised if an attempt is made to assign an out-of-range value to a variable. An exception is also raised if a result of an expression overflows the bounds of the ranged type's base type. This article details a class template I have created to implement a C++ type with similar behavior.

Design Goals

One requirement I have for the class is that it must be written in strictly conforming Standard C++. This conflicts with the desire that the code be efficient as possible. Checking for overflow in C++ without resorting to inline assembly can be expensive. One of the reasons Ada has efficient ranged checked types is because these types are part of the language. An Ada compiler can analyze the code and remove those checks that are unnecessary. I decided to implement this behavior in my class using templates. What is needed is a method to determine at compile-time if a given mathematical operation can overflow.

The possible occurrence of two conditions needs to be determined at compile-time. These conditions are overflow and out-of-range. Overflow occurs when the result of an operation exceeds the limits of the primitive used in defining the ranged type. An out-of-range condition occurs when an attempt is made to assign a value to a variable that is outside the bounds of the variable's defined range.

Method of Analysis

The simplest condition is out-of-range. This is merely a matter of checking that the constructor is only called with values within the allowed range. The bounds of the argument type dictate whether checks are made. If either the upper or lower bound of the type is within the range, the check is removed. For example, assume the range of type signed char is [-128, 127]. If a variable with a range of [0, 65536] is constructed with a value of type signed char, only the lower bound would be checked. Since the value cannot exceed the variable's upper bound, the check can be removed.

The more thorny issue is how to determine if an overflow is possible. Keeping track of the maximum range resulting from an operation is one possible way. For example, if two variables of a type with the range [-100, 100] are multiplied, then the resulting range is [-10000, 10000]. Any possibility of overflow may be determined by doing this for each sub-expression. This is messy and overly complex, because it requires overflow checking when calculating each new range.

There is an easier way. Instead of keeping track of maximum and minimum range values, the code keeps track of the maximum possible digits. Ignoring the sign bit, a variable with the range [-100, 100] will have a maximum of 7 binary digits. The maximum resulting binary digits may be determined for every mathematical operation. Multiplying two numbers, each with a maximum of 7 binary digits, creates a product with at most 14 binary digits. So, for each sub-expression: 1) calculate the maximum digits required to hold the result, based on the maximum digits of each operand and the type of operation; 2) locate a primitive type with the same signedness as the base type that will hold a value of at least the resulting number of digits; 3) if the implementation has no such primitive type, flag the operation as having a possibility of overflowing; 4) do the operation, checking for overflow if necessary. Obviously, this method only works for integral types. So, currently floating point types are not supported.

The Interface

The ranged type class template is defined approximately as:

template<class T, T MIN, T MAX, class TR 
    = ranged_traits<T, MIN, MAX> >
class ranged_type;

Where T is the base type, MIN and MAX define the range, and TR defines a traits class. The traits class defines various members to control the behavior of various aspects of the class, such as what is to be done upon detecting an overflow condition. The default of which is to throw an exception.

I said the class template is approximately defined as above because there are additional template parameters that are defaulted. These are defined by the implementation and are used for class specialization. These parameters are defined from members of the traits class, and allow classes to be instantiated with or without explicit constructors taking primitive type arguments, and/or cast operators to primitive types. The default is the safest, having explicit constructors and no cast operator. Assignment, arithmetic, relational, and equality operators are defined for the class. These are defined for every combination of the ranged type class and its intermediate types. Intermediate types hold the result of arithmetic operations and keep track of the maximum number of digits.

Mixing constants and variables of primitive types in expressions with ranged types is problematic. At issue is determining at compile-time the number of digits in the value. Obviously, the number of digits in an integral constant expression (ICE) can be calculated at compile-time, but the C++ language does not allow an operator to be overloaded based on that. The very nature of a variable makes it impossible to determine its number of digits at compile-time. Casting the value to the ranged type is one way to use it in an expression. There are two possible problems with this: 1) it can be inefficient because it causes additional range checks to be done, and 2) the value may exceed the range. To compensate for this, two templates are defined, constant and variable, that convert values to the range type's intermediate type. Their use will lead to expressions that are harder to read, but can be more efficient to execute.

Implementation

The C++ Standard allows for three possible representations for integer types (one's complement, two's complement, and signed magnitude) and overflow checking must allow for all three. The only issue that complicates the code is the possibility that an integer type may not be symmetrical around zero. To make the checks as efficient as possible, the code assumes that there is at most one extra value and it is negative. Although this is not explicitly stated in the Standard, I believe implementations failing the assumption are either non-existent or at least exceedingly rare.

Eliminating unnecessary checks is achieved using templates and specialization. An arithmetic operator is specialized on whether overflow is possible and the signedness of its operands. If no overflow is possible, the operation is simply executed. Otherwise, the operation must check for overflow. The operator is specialized on signedness because detecting overflow with unsigned values can be done more efficiently. Unsigned operations wrap back around zero instead of overflowing. For signed values, the overflow must be checked first because allowing the overflow to occur causes undefined behavior. The actual steps taken by the compiler can be best illustrated by an example. Given the following statements:

typedef ranged_type<unsigned, 0, 100> R;
R x, y, z;  // The variables are defaulted to the 
            // minimum of the range which in this
            // case is 0.  Sometime later these
            // variables are initialized.  Their
            // actual values do not matter for
            // this example.
            
const R a = 5, b = 7, c = 1;
R r = ((x + a) * (y + b)) / (z - c);

The constants a, b, and c are initialized to their respective literals. They are ranged checked, and these checks may or may not be removed by the compiler's optimizer. On typical 32-bit machines, char is 8 bits, short is 16 bits, and int and long are both 32 bits. The compiler determines that the maximum number of digits required for the two sums is eight (the maximum number of digits for the two operands, plus one). A base type must be chosen for the intermediate value. unsigned char would be a natural choice, but the traits class defines a static constant member called min_intermediate_bits. This member defines the minimum number of bits to use in choosing the base. It is defaulted to (in this case) the number of digits type unsigned int can hold. The purpose is to limit the amount of casting that would be done as each operation is evaluated. If unsigned char or short were used, then for each operation, the operands would be promoted to unsigned int (if not already) and subsequently cast back to unsigned char or short until the maximum number of digits reached that of unsigned int. So, unsigned int is chosen for the base type used in the intermediate. Since unsigned int can hold an eight-digit value, no overflow can occur and no check is done for either sum.

Next is the multiplication. Given that the operands are each eight-digit values, an intermediate with at least 16 digits is needed. As before, though unsigned short would fit the bill, the compiler chooses unsigned int. The product will fit, so again, no check is done.

Now comes the subtraction. Both operands are no more than seven digits, so the intermediate has to have at least seven also. This is a special case. Even though unsigned int is chosen and can hold an eight-digit value, a check is still necessary because the difference may be negative. If signed int had been used, no check would have been done. To keep the code simple, the current implementation will define all intermediate values using types of the same signedness as the base type of the ranged type.

The last of the arithmetic operations is the division. The divisor has a maximum of sixteen digits and the dividend has a maximum of seven. Thus, the quotient will have a maximum of sixteen digits. This is another special case. A check for divide-by-zero has to be made. Finally, the result is assigned to the variable r. Since the lower bound is zero and the base type is unsigned, no check is made. The upper bound does have to be range checked because the maximum value of unsigned int is greater than 100.

As sub-expressions are evaluated, the number of digits in the intermediate values will grow, and the compiler will use larger integer types. It will even use implementation-specific types such as GCC's long long. This may be undesirable, so the traits type also defined a static constant member called max_intermediate_bits. This puts an upper limit on the number of digits when the compiler chooses the base type for an intermediate. It defaults to the number of digits in signed or unsigned int as appropriate.

Benchmarks

In order to determine what kind of saving could be achieved, I defined a test using the expression:

The values of the constants a, b, c, d and the variables w, x, y, z were all chosen so that the expression would never overflow. Three cases were done. The first used unsigned int. The second used a range type defined similar to one above. The third was like the second, except the code was compiled so that the checks were always done regardless of the possibility of overflow. For each case, the expression was put in an external function which was executed several million times, and an average elapsed time for one execution calculated. For this particular test, using my machine, and my compiler (Visual C++ Toolkit 2003 (13.10.3077), optimization level O2) the results were:

1st case: 0.0246 seconds
2nd case: 0.0297 seconds
3rd case: 0.1119 seconds

The results using GCC (MinGW 3.4.2, optimization level O3) were similar. This was an admittedly contrived test, but I was interested in seeing what the best case savings would be. Your mileage may vary.

There are various ways to reduce the cost of the overflow checking at the cost of portability. One way is to increase the assumptions made by the code. For example, if your implementation treats signed overflow in a benign and predicable way, then you can simplify those checks. Even more time can be saved by using inline assembly, if your compiler has such a feature.

Conclusions

The ranged_type class template provides integral types that are safe from overflowing and constrained to a given range. Using templates allows much of the overhead of the checks to be removed, though it is not possible to remove all. There is always some cost associated with safety. In the future, I'd like to expand the implementation to include floating point types, but only time will tell if that is even possible.

This class is part of a free C++ library of classes I have written over the years and have found useful in my own programs. The latest version can always be found here: http://www.richherrick.com/software/herrick_library.html

I would like to thank my peer reviewer, Victor Bazarov, for his invaluable help and insight.

Rich has thirteen years experience as a software engineer. For the past eight years he has been employed by Lockheed Martin Systems Integration in Owego, NY writing real-time embedded systems oriented towards avionics. He can be contacted at [email protected].