Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

The Case Against int, Part 3: The Advantages and Perils of Unsigned Arithmetic

March 07, 2012

Last week, I noted that integers are often used for counting, and that counting is often better suited to unsigned than to signed arithmetic. To that end, C++ defines an unsigned integer type, size_t, that is well-suited to storing array indices, or otherwise counting items that can fit in the computer's memory. Moreover, each of the standard-library containers, such as vector<T>, defines a type analogous to vector<T>::size_type, which is an unsigned integer with an appropriate capacity to store the number of elements in a vector<T>.

If you're counting objects, the number of objects you're counting can't be negative. Because unsigned integers cannot express negative values, a whole category of errors becomes impossible. This claim isn’t mere sophistry, either. Suppose, for example, you want to check whether n is a suitable index for a vector v. You might think you would have to write

 
             if (n >= 0 && n < v.size()) { /* n is suitable */ }

However, the fact that v.size() yields an unsigned value makes it possible to write

 
             if (n < v.size()) { /* n is suitable */ }

The reason is that, in general, combining signed and unsigned integers in C++ converts the signed quantity to unsigned. Accordingly, if n is negative, it is converted to a huge positive unsigned value, which will appear to be larger than v.size().

Such behavior is often desirable, as it is here; but it can also cause trouble. As an example of such trouble, consider a loop to visit all of the elements of v in ascending order:

 
typedef decltype(v.size()) vsize_t;
for (vsize_t n = 0; n != v.size(); ++n) { /* … */ }

Here, I have started by defining vsize_t as the type yielded by v.size(), and done so in a way that makes it unnecessary for me to know what kind of container v is. I have then written a straightforward loop that steps n through each index of v.

What if I want to make this loop run backward? The obvious way to write it is

 
for (vsize_t n = v.size()-1; n >= 0; --n) { /* Wrong! */ }

but a little examination should convince you that this code is a disaster. In fact, a considerate compiler will warn about the problem: n is unsigned, so n >= 0 is always true. Therefore, the loop will never terminate.

This example exposes one of the difficulties in working with unsigned quantities: Loops often terminate after a variable has passed beyond the range of acceptable values. If that range is bounded by 0, and you're using an unsigned variable, it may be impossible to go beyond that bound.

There's a general rule for working with such bounds: When a loop runs backward, decrement at the beginning of the loop; when it runs forward, increment at the end. Decrementing at the beginning simplifies the code in two ways: It makes it possible to start a loop at the upper off-the-end value, rather than one element earlier, and it ensures that the lower bound will never be transgressed.

This last paragraph is a bit of a mouthful, so let me show the two general forms I have in mind, using while instead of for to reduce the code to its essentials. Let's start with the familiar form of a loop that increases a variable's value each time through:

             vsize_t n = 0;
     while (n != v.size()) {
           // Process v[n]
           ++n;
     }
 

If we are moving backward through v, the form is less familiar:

             vsize_t n = v.size();
     while (n != 0) {
           --n;
           // Process v[n]
     }

Writing the loop in this slightly unusual form allows us to stop the loop before n crosses zero (which it will never do, so we had better not try to test for it). It also makes it unnecessary to write a separate calculation to start n at v.size()-1 instead of v.size(). Moreover, the same technique works for iterators:

             auto iter = v.end();
     while (iter != v.begin()) {
           --iter;
           // Process *iter
             }
 

The similarity between these last two code examples comes from a similarity between unsigned integers and iterators: Because there is an off-the-end value but no off-the-beginning value, both unsigned integers and iterators require special care when a value might run off the beginning of a range (i.e., might underflow).

Summary

  • Unsigned integers are particularly well suited to counting, because the number of items that one is counting cannot be negative.
  • Mixing signed and unsigned integers usually converts the signed integer to unsigned. This conversion can be useful (i.e., it can simplify bounds checking), but it can also cause trouble, particularly around lower bounds of ranges.
  • One rule of thumb for avoiding such trouble is to decrement a control variable at the beginning of a loop. Using this rule requires the variable to start out one step beyond its "real" first value, but that state of affairs is usually easy to achieve.
  • Techniques for writing loops that use unsigned integers can carry over to loops that involve iterators.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video