C/C++

Implementing Bit Vectors in C

By James Blustein, August 01, 1995

Bit vectors provide an efficient means of implementing arrays of Boolean values. James originally implemented bit vectors to hash document signatures and analyze statistics.

AUG95: Implementing Bit Vectors in C

Implementing Bit Vectors in C

Creating arrays of Boolean values is the key

James Blustein

James is a PhD student in computer science at the University of Western Ontario. He can be contacted at [email protected].

Bit vectors provide an extremely space- and time-efficient means of implementing arrays of Boolean values. In the most common case, bit vectors can represent data in one-eighth the space that a straightforward integer representation uses.

Many programmers are familiar with the use of single integers as collections of bit fields to flag error conditions and the like; see The C Programming Language, Second Edition, by Brian W. Kernighan and Dennis M. Ritchie (Prentice-Hall, 1988). Bit vectors are an extension of that concept. Treating an array of integers as though it were a single integer allows the array to encode even more data. Bit vectors are well suited to representing finite sets and are useful in hashing and signature applications. In Programming Pearls (Addison-Wesley, 1989), Jon Bentley showed that the complexity of many large software problems can be greatly reduced by using bit vectors.

In this article, I'll present a portable bit-vector implementation in C. I developed the code for use in two programs: a Bloom-filter program for hashing on document signatures and a statistical-analysis program in which a number of three-dimensional matrices of data must be analyzed. I use two bit vectors to record which of the matrices have been selected and which matrices' elements to analyze. This makes it simple to copy the submatrices of selected data, which I pass to general matrix-manipulation routines. The functions work well as part of the menu system from which the user selects data.

The Data Structure

At the heart of the implementation are arrays composed of the type bit (see Listing Three). The type must be an unsigned integer--unsigned char, unsigned short, unsigned int, or unsigned long--for the code to be portable; otherwise sign extension of the bitwise operators will produce different results on different machines.

Each element of a bit vector is represented internally by BITS_SZ bits. The definition of BITS_SZ is in Listing One. If the integer type used is unsigned char and CHAR_BIT is defined in the limits.h header file, then BITS_SZ is a preprocessor definition (technically, a macro with no arguments). Otherwise, the bits_size() function is used to count the number of bits in the chosen type. This slight complication is necessary for a portable implementation and requires a negligible amount of run time. To ensure that BITS_SZ has been properly initialized, call ba_init() once before using any of the other functions. In Listing Three, I have defined the type to be unsigned char. On most machines, this means that BITS_SZ will be 8.

The number of elements needed to hold n bits is the number of full elements needed and possibly an extra element for the overflow. The macro NELEM() (Listing Four) is used to compute the exact number in many of the routines in Listing Two.

The bit vectors imitate an array composed of single Boolean values that cannot be subscripted in the usual way. The functions ba_assign() and ba_value() in Listing Two provide ways of setting and reading the values, respectively.

You may find it helpful to think of a bit vector as representing a set containing only elements that have the value 1. The universe from which the set draws its elements has the same size as the bit vector. Any set operation can be performed using a combination of the four set operations given here. If the two bit vectors have the same number of elements, then ba_union() can be simulated using ba_complement and ba_intersection (and vice versa). Figure 1 shows how to compute a type of set difference.

If you prefer, you can think of the bit vectors as an extension of C's bitwise operators. C operators are defined to work with single integers. The routines presented here perform the equivalent operations, but with a synthetic integer composed of an arbitrary number of bits. Although the names of some of the functions come from the set perspective, you can think of them in terms of AND (ba_intersection()), Inclusive OR (ba_union()), XOR (ba_diff()), and NEGATION (ba_complement()).

Listing Two illustrates an implementation of bit vectors using C; Table 1 describes the functions.

How It Works

The bit vector is really a 2-D array of integers. To access the nth element of a bit vector, the code first locates the integer that contains that bit, and then it indexes into that integer to the exact bit. The ba_assign() function masks the bit to be set or cleared.

To be as general as possible, most of the routines require the calling routine to pass in the number of elements in the bit-vector array. This way the routines can be used with dynamically created bit vectors, such as those ba_new() provides.

If the size of the bit vector bv is known at compile time, then sizeof bv / sizeof bv[0] will be a compile-time constant representing the number of elements.

If the bit vector is dynamically allocated, then you can keep track of its size by wrapping it in a struct like that in Example 1(a). Example 1(b) is the struct I use for the menu-driven statistical program mentioned previously.

The set operations, ba_intersection(), ba_union(), and ba_diff() can be used to combine bit vectors in the same ways that the &, |, and ^ bitwise operators are used to combine single integers (see Kernighan and Ritchie). The ba_complement() function can be used to apply the one's complement (~) operator to every element of a bit vector. Figure 1 shows how the set operations can make a new bit vector with only those bits that are 1 in exactly one of the bit vectors A and B. This value can be contrasted with the effects of the set_diff() function. The values shown were created with a program written using the functions in Listing Two.

It may be more efficient for many of the functions in Listing Two to be written as macros, but I prefer functions because they provide increased modularity.

Most of the functions treat bit vectors as 1-D arrays of integers. The naive implementation would be to loop through every bit in every integer. These routines loop through every integer and set all the bits at once--providing a speedup of BITS_SZ times.

Listing Six contains alternative (slower, yet perhaps more intuitive) methods for computing the union and intersection functions. Contrast these with the implementations in Listing Two.

The ba_intersection() function is based on the observation that the intersection of two bit vectors contains as many bits as the shorter bit vector (only those elements that are set in both vectors appear in the intersection). ba_intersection() does a bitwise AND of the corresponding integers in both vectors up to and including the last one for the smallest of the vectors.

A bit-by-bit method of computing the intersection appears in Listing Six. It relies on the numeric values of Boolean expressions in C. The mask select is made up of all 0s except for a single 1 to mask the selected bit.

The union operation assumes that if a bit is set in either of the two bit vectors, then it is also set in their union. The ba_union() function first sets all the bits that appear in the longest of the two vectors. Then it loops through every integer in the shorter one, assigning BITS_SZ bits at a time to the union. The alternative form in Listing Six loops through every bit in the shorter vector.

The ba_count() function is written to be portable but fast. In the general case, it loops through every bit in the vector. If the BITS_SZ is known before the file is compiled, then a lookup table can be included so that every integer can be inspected at once. The code in Listing Two includes a lookup table for the case where BITS_SZ is 8, as it would be for most unsigned char implementations. If BITS_SZ is not 8, then a bit-looping method counts every bit in every integer.

The space cost incurred by the table is well worth the speedup it provides. Obviously, if you choose to use a type with a different number of bits, this table should be replaced with a table for the size of the type you choose. The table was computed using the bit-looping method.

The only advantage to using ba_dotprod() rather than calling ba_assign() and ba_value() in a nested loop is that ba_dotprod() does not incur as much calling overhead because it reads and sets the array elements directly.

If not every element of the array is used, there can be internal fragmentation of as much as BITS_SZ-1 unused bits. The ba_count() function is optimized to examine every element of the array. For this function to work as optimized, all unused bits must be 0, which I call the "canonical form." Bit vectors created using ba_new() and changed only using functions in the listings will always be in this form. The ba_all_assign() and ba_complement() functions change their first argument to force it into canonical or standard form. The size parameter passed to these functions must be exact, or some bits may be changed inappropriately.

It would be more efficient to coerce the bit vector into canonical form in ba_count() instead of many of the other functions. However, I feel it makes more sense to have a function that is meant only to count bits do just that. It would be an unpleasant side effect if the count function changed its argument and ba_complement(), for example, did not create the true complement. The cost is very small, and I feel it is worthwhile.

In Figure 2, the array bv is a bit vector composed of NumInts integers used to represent NumElem elements (bits). The code bv[NumInts-1] &= (~0<<(BITS_SZ-(NumElem%BITS_SZ))); affects the value of the last element of the array bv. As you can see, ~0 is an integer composed entirely of 1 bits, and ~0<<(BITS_SZ-(NumElem%BITS_SZ))) is a value used to mask off the unused bits at the tail end of the array. By shifting the 1s left as many times as there are unused bits, the 1s move left and are replaced by 0s, which clear the rightmost bits and produce the canonical form.

In Figure 2, BITS_SZ is 8 and numbits is 19. Therefore, only three bits of the eight in the last integer are used. Figure 2 shows how the mask is built and what happens when it is applied.

Stylistic Decisions

I chose to implement NELEM (Listing Four) and CANONIZE (Listing Five) as macros rather than as static functions because they are used many times and are very simple. The overhead of calling either of them as a function seemed unjustified. The first_is_largest() function is more complicated, however. It is declared static so that it cannot accidentally be called by functions in any other files. Its name also serves to document its purpose. None of the macros nor the static function can be called by functions outside of the file.

All the nonstatic functions' names are unique within six monocase characters and begin with ba_ so they won't conflict with functions declared in other files.

I chose to declare the type bool (Listing Three) as an enum rather than as a preprocessor symbol or explicit integer type because many debuggers will display symbolic names of enums.

Conclusion

Bit vectors provide a fast, space-efficient method of representing and manipulating arrays of Boolean values. They are great for representing finite sets and as part of other data structures.

Figure 1: Set operations with bit vectors. (a) Union of two bit vectors; (b) intersection of two bit vectors; (c) set difference of two bit vectors; (d) using set operations to create a new set.

Figure 2: Converting 10111111 into canonical form.

Table 1: (a) Creation and initialization; (b) setting and clearing; (c) conversion operations; (d) set operations; (e) miscellaneous function.


Function          Descriptions

(a) <I>ba_init()</I>    Determines word size of current
                   platform. Makes code portable.
    <I>ba_new()</I>     Dynamically allocates and
                   initializes memory to hold a
                                  bit vector of given size.
    <I>ba_copy()</I>    Copies a subset of one bit
                   vector into another bit vector.
(b) <I>ba_assign()</I>  Sets (or clears) an element of a
                   given bit vector.
    <I>ba_value()</I>   Returns the value of one element
                   of a given bit vector.
    <I>ba_toggle()</I>  Flips the state of an individual
                   element in a given bit vector.
    <I>ba_all assign()</I>   Efficiently sets (or
                   clears) all elements of a
                   bit vector at once.
(c) <I>ba_b2str()</I>   Produces a string version of a
                   subset of a given bit vector.
                   Can be used to display bit
                   vector, as <I>ba_print()</I>
                   demonstrates.
     <I>ba_ul2b()</I>   Sets a bit vector to a value
                   that matches an <I>unsigned long</I>.
(d) <I>ba_count()</I>   Returns number of 1 elements in
                   a given bit vector. The number
                   of 0 elements is the difference
                   between the total number of
                   elements and this value.
    <I>ba_intersection()</I>   Efficiently creates the
                   set intersection of two
                   bit vectors. <a href="/showArticle.jhtml?documentID=ddj9508d&pgno=8">Figure 1(a)</A>
                   shows how to filter only
                   certain elements using
                   <I>ba_intersection()</I>.
    <I>ba_union()</I>   Efficiently creates the set
                   union of two bit vectors.
                   <a href="/showArticle.jhtml?documentID=ddj9508d&pgno=9">Figure 1(b)</A> shows how to
                   merge two bit vectors using
                   <I>ba_union()</I>.
    <I>ba_diff()</I>    Creates a new bit vector with
                   bits set only where both others do not.
    <I>ba_complement()</I>   Flips all elements of a
                   given bit vector.
    <I>ba_dotprod()</I> Computes the scalar product of
                   two bit vectors.
(e) <I>ba_print()</I>   Prints a string representation
                   of a bit vector.

Example 1: (a) A struct used to keep track of the size of a dynamically allocated bit vector; (b) a struct used in a statistical-analysis program.

(a) typedef struct {
       elem_t size;   /* how many items in array */
       bit *  array;  /* bit vector recording which elements are selected */
    } bitvector;
(b) typedef struct {
          elem_t  size;     /* how many items in selected */
           bit *   selected; /* bit vector recording which elements are selected */
          elem_t  max;      /* maximum possible size */
          char ** name;     /* array of names of items */
          char *  title;    /* what data is represented by this struct? */

    } chose_t;

Listing One

/* BITS_SZ -- BITS_SZ is the number of bits in a single bits' type.  */
/* Definition of BITS_SZ */
#ifdef CHAR_BIT
   /* assumes typedef unsigned char bits */
   #define BITS_SZ (CHAR_BIT)
#else
   static elem_t bits_size(void);
   elem_t BITS_SZ = 0;  /* until it is initialized by ba_init() */
   static elem_t bits_size(void) {
       /*  Adapted from the wordlength() function on page 54 (Exercise
          2-8) of _The C Answer Book_ (2nd ed.) by Clovis L. Tondo
          and Scott E. Gimpel.  Prentice-Hall, Inc., 1989. */
      elem_t i;
      bits   v = (bits)~0;
      for (i=1; (v = v >> 1) > 0; i++) 
         ; /* EMPTY */
      return (i);
   }
#endif

Listing Two

#include <stdio.h>
#include <malloc.h>
#include "types.h"  /* Listing 3  */
#include "bitarr.h" /* exported prototypes */
/* Listing 1 (definition of BITS_SZ) goes here */
/* Listing 4 (NELEM macro) goes here */
/* Listing 5 (CANONIZE macro) goes here */
typedef struct {elem_t size; bit *vector;} BitVector;
static void first_is_biggest(BitVector bv[2], unsigned *, unsigned *);
/*  ------ Initialization and Creation Code ------ */
elem_t ba_init(void) 
{
/* ba_init()
   PRE:  Must be called before use of any other ba_ functions.  Should
         only be called once.
   POST: Returns the number of values that can be stored in one variable of
         type bit. If <limits.h> does not define CHAR_BIT then the module
         global variable BITS_SZ has been set to the appropriate value. 
*/
   #ifndef BITS_SZ
      if (!BITS_SZ) {
         BITS_SZ = bits_size();
      }
   #endif
   return (BITS_SZ);
} /* ba_init() */
bit *ba_new(const elem_t nelems) 
{
/* ba_new()
   PURPOSE: dynamically allocate space for an array of nelems bits
         and initalize the bits to all be zero.
   PRE:  nelems is the number of Boolean values required in an array
   POST: either a pointer to an initialized (all zero) array of bit or 
         space was not available and NULL was returned
   NOTE: calloc() guarantees that the space has been initialized to 0.
   Used by: ba_ul2b(), ba_intersection() and ba_union().
*/
   size_t howmany =  NELEM(nelems,(BITS_SZ));
   return ((bit *)calloc(howmany, sizeof(bit)));
} /* ba_new() */
void ba_copy(bit dst[], const bit src[], const elem_t size) 
{
/* ba_copy()
   PRE:  dst has been initialized to hold size elements. src 
         is the array of bit to be copied to dst.
   POST: dst is identical to the first size bits of src. src is unchanged.
   Used by: ba_union()
*/
            elem_t nelem  = NELEM(size,(BITS_SZ));
   register elem_t i;
   for (i=0; i < nelem; i++) {
      dst[i] = src[i];
   }
} /* ba_copy() */
/* ------- Assigning and Retrieving Values ------ */
void ba_assign( bit arr[], elem_t elem, const bool value) 
{
/* ba_assign()
   PURPOSE: set or clear the bit in position elem of the array arr
   PRE:     arr[elem] is to be set (assigned to 1) if value is TRUE, 
            otherwise it is to be cleared (assigned to 0).  
   POST:    PRE fulfilled.  All other bits unchanged.
   SEE ALSO: ba_all_assign()
   Used by:  ba_ul2b()
*/
   if (value) {
      arr[elem / BITS_SZ] |= (1 << (elem % BITS_SZ));
   } else {
      arr[elem / BITS_SZ] &= ~(1 << (elem % BITS_SZ));
   }
} /* ba_assign() */
bool ba_value(const bit arr[], const elem_t elem) 
{
/* ba_value()
   PRE:  arr must have at least elem elements
   POST: The value of the elemth element of arr has been returned
         (as though arr was just a 1-dimensional array of bit)
   Used by: ba_b2str() and ba_count()
*/
   return(arr[elem / BITS_SZ] & (1 << (elem % BITS_SZ)));
} /* ba_value() */
void ba_toggle( bit arr[], const elem_t elem) 
{
/* ba_toggle()
   PRE:  arr must have at least elem elements
   POST: The value of the elemth element of arr has been flipped, 
         i.e. if it was 1 it is 0; if it was 0 it is 1.
   SEE ALSO: ba_complement()
*/
 arr[elem / BITS_SZ] ^= (1 << (elem % BITS_SZ));
} /* ba_toggle() */
void ba_all_assign( bit arr[], const elem_t size, const bool value) 
{
/* ba_all_assign()
   PRE:  arr has been initialized to have *exactly* size elements.
   POST: All size elements of arr have been set to value.  
         The array is in canonical form, i.e. trailing elements are  all 0.
   NOTE: The array allocated by ba_new() has all elements 0 and is 
         therefore in canonical form.
   SEE ALSO: ba_assign()
   Used by: ba_ul2b()
*/
            elem_t nelem  = NELEM(size,(BITS_SZ));
            bit    setval = (value) ?~0 :0;
   register elem_t i;
   for (i=0; i < nelem; i++) {
      arr[i] = setval;
   }
   /* force canonical form */
   CANONIZE(arr,nelem,size);
} /* ba_all_assign() */
/* ------- Conversion Routines ------- */
bit * ba_ul2b(unsigned long num, bit * arr, elem_t * size) 
{
/* ba_ul2b()
   PRE:  Either  arr points to space allocated to hold enough bits to
         represent num (namely the ceiling of the base 2 logarithm
         of num).  size points to the number of bit to use.
       OR arr is NULL and the caller is requesting that enough
         space be allocated to hold the representation before the
         translation is made. size points to space allocated to
         hold the count of the number of bit needed for the
         conversion (enough for MAXLONG).
   POST: A pointer to a right-aligned array of bits representing the
         unsigned value num has been returned and size points to
         the number of bits needed to hold the value.  
       OR the request to allocate space for such an array could not be granted
   NOTES: - The first argument is unsigned.  
          - It is bad to pass a size that is too small to hold the
            bit array representation of num [K&R II, p.100].
          - Should the size be the maximum size (if size > 0) even
            if more bits are needed?  The user can always use a filter
            composed of all 1s (see ba_all_assign()) intersected with
            result (see ba_intersection()).
*/
   register elem_t i;
  
   if (NULL != arr) {
      ba_all_assign(arr, *size, 0);
   } else {
      *size = NELEM(sizeof(num),sizeof(bit));
      *size *= BITS_SZ;
      if (NULL == (arr = ba_new(*size))) {
         return (arr);
      } 
   }
   /* usual base conversion algorithm */
   for (i=0; num; num >>= 1, i++) {
      ba_assign(arr, (*size - i - 1), (1 == (num & 01)));
   }
   return (arr);
} /* ba_ul2b() */
char * ba_b2str(const bit arr[], const elem_t size, char * dest)
{
/* ba_b2str()
   PRE: arr is a bit array with at least size elements.  Either
         dest points to enough allocated space to hold size + 1
         characters or dest is NULL and such space is to be
         dynamically allocated.
   POST: Either dest points to a null-terminated string that
         contains a character representation of the first size
         elements of the bit array arr;
      OR dest is NULL and a request to dynamically allocate memory
         for a string to hold a character representation of arr was
         not be granted.
   Used by: ba_print()
*/
   register elem_t i;
   if ((NULL != dest) || \
       (NULL != (dest = (char *)malloc(size + 1)))) {
      for (i=0; i < size; i++) {
         dest[i] = (ba_value(arr,i) ?'1' :'0');
      }
      dest[size] = '\0';
   }
   return (dest);
} /* ba_b2str() */
/* ------- Mathematical Applications -------- */
unsigned long ba_count(const bit arr[], const elem_t size) 
{
/* ba_count()
   PRE:  arr is an allocated bit array with at least size elements
   POST: The number of 1 bits in the first size elements of arr
         have been returned.
   NOTE: if arr is not in canonical form, i.e. if some unused bits
         are 1, then an unexpected value may be returned.
*/
  register     unsigned long count; 
  register     elem_t        i;     
               elem_t        nelem = NELEM(size,(BITS_SZ));
  static const unsigned bitcount[256] = {0, 1, 1, 2, 1, 2, 2, 3, 1, \
        2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, \
        4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, \
        3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, \
        3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, \
        4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, \
        5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, \
        2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, \
        4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, \
        4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, \
        6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, \
        4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, \
        5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, \
        6, 6, 7, 6, 7, 7, 8};
   if (8 == BITS_SZ) {
     /* lookup table will speed this up a lot */
     for (count = 0L, i = 0; i < nelem; i++) {
        count += bitcount[arr[i]];
     }      
   } else {
     for (count = 0L; i < size; i++) {
        if (ba_value(arr, i)) {
           count++;
        }
     }
   }
   return (count);
} /* ba_count() */
bool ba_intersection( bit first[], bit second[], bit * result[],
                     const elem_t size_first, const elem_t size_second) 
{
 /* ba_intersection()
   PRE:  first  is a bit array of at least size_first  elements.
         second is a bit array of at least size_second elements.
         result points to enough space to hold the as many elements
         as the smallest of size_first and size_second;
       OR result points to NULL and such space is to be dynamically allocated.
   POST: TRUE has been returned and 
         result points to a bit array containing the intersection
         of the two arrays up to the smallest of the two sizes;
       OR FALSE has been returned and
         result pointed to NULL (a request was made to allocate 
         enough memory to store the intersection) but the required 
         memory could not be obtained.
   NOTE: This runs faster if the first array is not smaller than second.
 */
   register elem_t i;
            elem_t numints;   
            unsigned  largest=0, smallest=1;
            BitVector bv[2];
   bv[largest].size   = size_first;
   bv[largest].vector = first;
   bv[smallest].size   = size_second;
   bv[smallest].vector = second;
   first_is_biggest(bv, &largest, &smallest);
   /* allocate space if *result is NULL */
   if ((NULL == *result) && \
      (NULL == (*result = ba_new(bv[largest].size)))) {
      return(FALSE); /* can't get memory, so can't continue */
   } else {
      numints = NELEM(size_second,(BITS_SZ));
      for (i=0; i < numints; i++) {
         (*result)[i] = (bv[smallest].vector[i] & \
                         bv[largest].vector[i]);
      }
      /* bits beyond size_second should be zero -- canonical form */
      CANONIZE(*result, numints, size_second);
      return(TRUE);
   }
} /* ba_intersection() */
bool ba_union( bit first[], bit second[], bit *  result[],
              const elem_t size_first, const elem_t size_second) 
{
 /* ba_union()
   PRE:  first  is a bit array of at least size_first  elements.
         second is a bit array of at least size_second elements.
         result points to enough space to hold the as many elements
         as the largest of size_first and size_second;
       OR result points to NULL and such space is to be dynamically allocated.
   POST: TRUE has been returned and result points to a bit array containing 
        union of two arrays (up to the size of the largest of the two sizes);
       OR FALSE has been returned and result pointed to NULL (a request was 
         made to allocate enough memory to store the union) but the required 
         memory could not be obtained.
   NOTE: This runs faster if the first array is not smaller than second.
 */
    register elem_t    i;
             elem_t    numints;   
             unsigned  largest=0, smallest=1;
             BitVector bv[2];
   bv[largest].size   = size_first;
   bv[largest].vector = first;
   bv[smallest].size   = size_second;
   bv[smallest].vector = second;
   first_is_biggest(bv, &largest, &smallest);
   if ((NULL == *result) && \
      (NULL == (*result = ba_new(bv[largest].size)))) {
      return(FALSE); 
   } else {
      ba_copy(*result, bv[largest].vector, bv[largest].size);
      numints = NELEM(bv[smallest].size,(BITS_SZ));
      for (i=0; i < numints; i++) {
         (*result)[i] |= bv[smallest].vector[i];
      }
      CANONIZE(*result, numints, bv[largest].size);
      return(TRUE);
   }
} /* ba_union() */
bool ba_diff( bit first[], bit second[], bit *  diff[],
             const elem_t size_first, const elem_t size_second) 
{
 /* ba_diff()
   PRE:  first  is a bit array of at least size_first  elements.
         second is a bit array of at least size_second elements.
         diff points to enough space to hold the as many elements
         as the largest of size_first and size_second;
       OR diff points to NULL and such space is to be dynamically allocated. 
   POST: TRUE has been returned and 
         diff points to a bit array containing the union of the
         two arrays (up to the size of the largest of the two sizes);
       OR FALSE has been returned and result pointed to NULL (a request was 
         made to allocate enough memory to store the result) but the required 
         memory could not be obtained.
   NOTE: This runs faster if the first array is not smaller than second.
 */
    register elem_t    i;
             elem_t    numints;   
             unsigned  largest=0, smallest=1;
             BitVector bv[2];
   bv[largest].size   = size_first;
   bv[largest].vector = first;
   bv[smallest].size   = size_second;
   bv[smallest].vector = second;
   first_is_biggest(bv, &largest, &smallest);
   if ((NULL == *diff) && \
      (NULL == (*diff = ba_new(bv[largest].size)))) {
      return(FALSE); 
   } else {
      ba_copy(*diff, bv[largest].vector, bv[largest].size);
      numints = NELEM(bv[smallest].size,(BITS_SZ));
      for (i=0; i < numints; i++) {
         (*diff)[i] ^= bv[smallest].vector[i];
      }
      CANONIZE(*diff, numints, bv[largest].size);
      return(TRUE);
   }
} /* ba_diff() */
void ba_complement( bit arr[], const elem_t size)
{
/* ba_complement()
   PRE:  arr is a bit array composed of *exactly* size elements.
   POST: All bits in arr have been flipped and arr is in canonical form.
   SEE ALSO: ba_toggle()
*/
            elem_t nelem = NELEM(size,(BITS_SZ));
   register elem_t i;
   for (i=0; i < nelem; i++) {
      arr[i] = ~arr[i];
   }
   /* force canonical form */
   CANONIZE(arr, nelem, size); 
} /* ba_complement() */
unsigned long ba_dotprod(const bit first[], const bit second[],
                         const elem_t size_first, const elem_t size_second) 
{
 /* ba_dotprod()
   PRE: first is an array of at least size_first bits.  second
         is an array of at least size_second bits.
   POST: The scalar product of the two vectors represented by the
         first size_first elements of first and the first
         size_second elements of second have been returned.
*/
   register elem_t        i, j;
   register unsigned long sum = 0L;
   for (i=0; i < size_first; i++) {
      for (j=0; j < size_second; j++) {
         sum += (first[i/BITS_SZ]  & (1<<(i % BITS_SZ))) \
               &&                                        \
                (second[j/BITS_SZ] & (1<<(j % BITS_SZ)));
      }
   }
   return (sum);
} /* ba_dotprod() */
/* ------ Miscellaneous ------- */
static
void first_is_biggest(BitVector  bv[2], unsigned * big, unsigned * small)
{
   if (bv[*big].size < bv[*small].size) {
      unsigned temp;
      temp = *big;
      *big = *small;
      *small = temp;
   }
} /* first_is_biggest() */
/*  -------- Miscellaneous --------- */
bool ba_print(const bit arr[], const elem_t size, FILE * dest) 
{
   char * to_print = ba_b2str(arr, size, NULL);
   if (NULL != to_print) {
      bool status = (EOF != fputs(to_print, dest) );
      free(to_print);
      return (status);
   } else {
      return (FALSE);
   }
} /* ba_print() */

Listing Three

/* "types.h" */
#include <stddef.h>
 typedef enum bool {FALSE, TRUE} bool;
 typedef size_t                  elem_t;
 typedef unsigned char           bit;

Listing Four

/*  macro NELEM(). The number of elements, nelem, in an array of N bits can be
 computed using the formula:
   if (0 == (N % BITS_SZ))
      nelem = N/BITS_SZ
   else
      nelem = N/BITS_SZ + 1
   This can be represented in any of these ways:
     nelem = N/(BITS_SZ)  +  1 - (0 == (N %(BITS_SZ)))
     nelem = N/(BITS_SZ)  +  !(0 == (N %(BITS_SZ)))
     nelem = N/(BITS_SZ)  +  (0 != (N %(BITS_SZ)))
   The macro NELEM uses this last form.  
*/
#define NELEM(N,ELEMPER) (N / (ELEMPER) + (0 != (N % (ELEMPER))))

Listing Five

/* macro CANONIZE(). Array is an array of NumInts type bit representing 
 NumElem bits Forces Array into canonical form, i.e. all unused bits are 
  set to 0  */
#define CANONIZE(Array,NumInts,NumElem) \
 (Array)[NumInts - 1] &= (~0 << (BITS_SZ - (NumElem % BITS_SZ)));

Listing Six

void ba_Xintersection(const bits first[], const bits second[], bits * result[],
                                        elem_t size_first, elem_t size_second) 
{
   register elem_t i;
   register bits   select;
            bits * largest  = first;
            bits * smallest = second;
   for (i=0; i < size_smallest; i++) {
      select = (1 << (i % BITS_SZ));
      *result[i/BITS_SZ] =                 \
         (smallest[i/BITS_SZ] & select) && \
         (largest[i/BITS_SZ] & select);
   }
} /* ba_Xintersection() */
void ba_Xunion(const bits first[], const bits second[], bits * result[], 
                                        elem_t size_first, elem_t size_second) 
{
    register elem_t i;
            bits * largest  = first, * smallest = second;
   /* any bits that are set in the longest vector are set in the union */
   /* all other bits are initially zero */
   ba_copy(*result, largest, size_first);
   /* bits that are set in the shortest vector are set in the union */
   for (i=0; i < size_smallest; i++) {
      *result[i/BITS_SZ] |= \
       (smallest[i/BITS_SZ] & (1 << (i % BITS_SZ)));
   }
} /* ba_Xunion() */

Previous 5 6 7 8 9 10

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Implementing Bit Vectors in C

Implementing Bit Vectors in C

Creating arrays of Boolean values is the key

James Blustein

The Data Structure

How It Works

Stylistic Decisions

Conclusion

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Listing Six

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Implementing Bit Vectors in C

Implementing Bit Vectors in C

Creating arrays of Boolean values is the key

James Blustein

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content