Channels ▼
RSS

The Standard Librarian: Bitsets and Bit Vectors


May 2001/The Standard Librarian


Anyone who has been programming for any length of time is familiar with Boolean option flags: treating a group of options as a single unit by packing them into one word, using a single bit for each choice. So, for example, to set the permissions on a Unix file, you might write something like the following:

chmod("my_file",
       S_IWUSR | S_IRUSR |
       S_IRGRP | S_IROTH);

Each of those constants corresponds to a single bit; by combining them with the bitwise-or operator, you can specify many options at once.

Packing multiple option bits into a single word is extremely common. This technique is used in many places in the Unix and Win32 APIs, in the ios_base format flags in the C++ Standard library, and some form of it tends to show up in most large programs. Collections of bits are important.

It’s not hard to see why this technique is common: the alternative, an array or a struct with a different field for each option, would be clumsier and would be wasteful of memory. However, there are also times when this technique can be a nuisance. First, some operations can be awkward: setting a named bit is straightforward enough (flags |= S_IRGRP), but clearing a bit (flags &= ~S_IWGRP) is somewhat uglier. You can test whether a bit is set by using it as a mask, as in if (flags & S_IWUSR), but woe to you if you make the mistake of “clarifying” the test by writing it as if ((flags & S_IWUSR) == true), or, still worse, if (flags & S_IWUSR == true). Referring to a numbered bit, as opposed to a named one, is also awkward: you wind up with expressions like flags &= ~(1 << n), usually sprinkled with casts. Finally, this technique is hard to scale to cases where you have many options: once the number of flags exceeds the number of bits in a long, you need to do something else.

Because collections of bits are important, the C++ Standard library provides explicit support for them — in fact, several kinds of support. Sometimes you’ll still want to use low-level bit manipulation operations (and sometimes you’ll have to, if you’re interfacing with a C API), but in most cases the C++ library’s versions are better. They have a few minor quirks, but most of them are easily worked around.

bitset

The class std::bitset appears with “associative containers” in chapter 23 of the C++ Standard. That’s not really the right place for it, since bitset has nothing to do with the associative containers like set and map; it doesn’t satisfy even the most basic requirements for an STL container. It’s better to think of bitset as an integer, each of whose bits you can access individually — except it’s an integer that isn’t restricted to the size of a long. The size of a bitset is determined at compile time (the number of bits is a template parameter), but there is no upper limit: a bitset<32> is 32 bits long, and a bitset<1000> is 1000 bits.

All of the integer bitwise operations that you’re already used to are still available with bitset, and, for the sake of convenience, a few more as well. So, for example, you can perform bitwise-xor by writing b1 ^ b2 (at least if b1 and b2 are the same length). There are two different interfaces for manipulating individual bits: you can set the nth bit with b.set(n), clear it with b.reset(n), and test it with if (b.test(n)); or, almost equivalently, you can think of a bitset as an array and write these same operations as b[n] = true, b[n] = false, and if (b[n]). (“Almost,” because there is one small difference: the array version isn’t range checked, but the set/reset/test version is. If the argument you pass to set, reset, or test is too large, you’ll get an out_of_range exception.)

If your bitset is modestly sized, there’s a very direct sense in which you can think of it as an integer: there’s a constructor that creates a bitset from an unsigned long, and a member function, to_ulong, that extracts an unsigned long from a bitset. Of course, you can’t directly use this constructor to initialize more bits than unsigned long supplies; similarly, you can’t use to_ulong to extract more bits than will fit in an unsigned long. (If you try, and if any of the high-order bits beyond the range of unsigned long are set, then to_ulong will throw an exception.) If necessary, however, you can use shifting and masking to get around those restrictions:

const int N =
   sizeof(unsigned long) * CHAR_BIT;
unsigned long high = 0x7B62;
unsigned long low = 0x1430;
std::bitset<2*N> b
   = (std::bitset<2*N>(high) << N) |
      std::bitset<2*N>(low);
...
const std::bitset<2*N>
   mask((unsigned long)(-1));
low = (b & mask).to_ulong();
high = (b >> N).to_ulong();

The zeroth bit is defined to be the least significant bit, so, for example, if you write:

std::bitset<4> b(0xA);

the bits that are set are b[1] and b[3].

It’s easy enough to use bitset as a replacement for the traditional version of option flags: just declare bitset objects in the header file, instead of integer constants. We’ve already mentioned two advantages of using bitset: you can have more flags than will fit into a long, and you get easier and safer methods of manipulating individual bits. Another is that bitset gives you a conversion mechanism to and from a textual representation.

First, bitset provides the usual I/O operators. This program,

#include <bitset>
#include <iostream>

int main() {
   std::bitset<12> b(3432);
   std::cout << "3432 in binary is "
             <<  b << std::endl;
}

gives the obvious result:

3432 in binary is 110101101000.

The input operator works the same way: it reads a string of ones and zeros, converting them to a bitset.

Second, you can convert bitsets to and from strings: there’s a constructor that takes a string argument, and a bitset<>::to_string member function. Alas, while these conversions are useful, the details turn out to be extremely inconvenient. Both the string constructor and the to_string member function are member templates, because the library class std::basic_string is itself a template; the ordinary string class, std::string, is an alias for basic_string<char>.

The generality of these member templates interacts in unfortunate ways with some of C++’s more obscure language rules. You have to write

std::bitset<6>(std::string("110101"));

as opposed to just

std::bitset<6>("110101");

The obvious version, where you just pass the string literal "110101", will give you a compile-time error because the compiler won’t know which version of the member template to instantiate. Similarly, if b is a bitset, you can’t just write

std::string s = b.to_string();.

You have to use this truly horrible form instead:

std::string s
   = b.template to_string<char,
         std::char_traits<char>,
         std::allocator<char> >();

(Yes, that funny looking template keyword in there really is necessary.)

In practice, of course, you shouldn’t pollute your code with things like that. Unless you really need to work with alternate character types, you can encapsulate these ghastly syntactic details in helper functions:

template <std::size_t N>
std::bitset<N>
from_string(const std::string& s) {
   return std::bitset<N>(s);
}

template <std::size_t N>
std::string
to_string (const std::bitset<N>& b) {
   return b.template to_string<char,
           std::char_traits<char>,
           std::allocator<char> >();
}

vector<bool>

A bitset does have one major limitation: it has a fixed length. You may have a bitset that’s much larger than a long, but you still have to specify its size in advance. That’s fine for something like a set of option flags, but it’s less appropriate for other purposes. Suppose, for example, that you’re processing a large collection of items in a complicated order, and you need to keep track of which ones you’ve already seen. This still calls for an array of Boolean values, and it still makes sense to use a “packed” array that uses a single bit for each element, but bitset is no longer an appropriate choice. The number of items you’re processing may not be known until run time, and items may even be added or removed.

The C++ Standard library’s other mechanism for managing collections of bits is vector<bool>, a specialization of the vector<> template. In some ways vector<bool> is very much like bitset: it uses one bit for each element, and it lets you use array syntax (e.g., v[3] = true) to access individual bits. The difference is that while bitset has special mechanisms of its own, vector<bool> uses the ordinary STL interface. You can change the number of elements in a vector<bool> by using the resize member function, or add new elements by using push_back, just as you can with any other vector<T>.

Although vector<bool> has no special support for bitwise operations, you can still perform those operations using ordinary STL algorithms and function objects. Instead of writing v3 = v1 & v2, for example, you could write:

std::vector<bool> v3(v1.size());
std::transform(v1.begin(), v1.end(),
   v2.begin(), v3.begin(),
   std::logical_and<bool>());

Similarly, to write a vector<bool> in the same format that operator<< uses when writing a bitset, you can again use an STL algorithm:

std::copy(v.rbegin(), v.rend(),
    std::ostream_iterator<char> (std::cout));

(This code relies on the fact that, by default, bool output uses '1' and '0' rather than “true” and “false”. Note also that we’re using rbegin and rend so that we copy the vector<bool> in reverse order. That’s how bitset output works: the leftmost digit in the printed representation of a bitset is b[N-1], not b[0].)

You should always use bitset in preference to vector<bool> when possible: a fixed-size data structure is going to have better performance, both in space and in time, than a data structure that supports the general-purpose vector interface. (In one timing test that I performed, bitset was faster than vector<bool> by a factor of almost five.) You need to use vector<bool> if you’re managing a collection of bits whose size can’t be known in advance.

It may also seem that there’s another time when you need to use vector<bool> instead of bitset: when it’s important to interoperate with STL algorithms. STL algorithms use iterators, and, while vector<bool> provides iterators (we’ve seen v.begin() and v.end()) in the examples above), bitset does not. You can access individual bits of a bitset with array syntax, but bitset does not have begin and end member functions.

However, you shouldn’t let that lack stop you! While bitset doesn’t have the STL container interface, it’s still a perfectly good (fixed-size) container. If it makes sense for you to use a bitset, and if you also need iterators, then you can define a simple “index iterator” adaptor that translates iterator notation like *i into array notation like b[n].

The implementation is straightforward: maintain an index and a pointer to a container. The details, most of which are boilerplate that we would need for any random access iterator, are shown in Listing 1. We also define some nonmember helper functions, begin and end, which take bitset arguments. (The iterator we’ve shown in Listing 1 isn’t quite as general as it might be: if we were willing to accept a slightly more cumbersome interface, we could define a class that worked with arbitrary array-like types. A general purpose index iterator adaptor is often useful when dealing with pre-STL container classes, and sometimes even when dealing with STL containers like vector.)

Using bitset_iterator, bitset can now be made to interoperate with STL components: you can, for example, copy a bitset to a vector<bool> by writing:

std::bitset<10> b;
...
std::vector<bool>
   b(begin(b), end(b));

If you’ve looked closely at Listing 1, however, you may have noticed one problem with bitset_iterator: the name is a lie, because bitset_iterator isn’t really an iterator! If i is an iterator, then *i is supposed to return a reference to the thing that i points to. That’s not what bitset_iterator does: a const bitset_iterator returns bool, not const bool&, and a modifiable bitset_iterator returns a proxy reference of type bitset<>::reference, not bool&.

Since bits aren’t individually addressable, this is the best we can do; in fact, vector<bool>::iterator behaves in exactly the same way — which, again, means that vector<bool> isn’t really an STL container. It’s not quite right to say that bitset_iterator or vector<bool>::iterator is an iterator, but, in both cases, they’re close enough to being iterators so that they can be used in many (not all!) places where iterators are expected.

Conclusion

Arrays of Boolean values are common in large programs, and the C++ Standard library provides several ways to represent such arrays. I haven’t exhausted all of the possibilities: you can use a valarray<bool>, for example; in some cases it might even be appropriate to represent a sparse bit vector as a set<size_t>.

Much of the time, however, the easiest solution is to use std::bitset. If you know at compile time how large your Boolean array is going to be, or if you can at least establish a reasonable upper bound, then bitset will be simpler and more efficient than a complicated dynamic data structure. There are a few annoying quirks in bitset’s interface, but, with a few helper functions, they’re easy to work around.

Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committee’s library working group. He works at AT&T Labs — Research and can be contacted at austern@research.att.com.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video