Figure 2 shows how the math system now works while in the middle of some arbitrary compression run. Even though we are using 32bit math, the algorithm is now dealing with arbitrarily long versions of high
and low
:
Figure 2: How the system now works.
As low
and high
converge, their matching digits are shifted out on the left side, presumably to a file. These digits are never going to change, and are no longer needed as part of the calculation. Likewise, both numbers have an infinite number of binary digits that are being shifted in from the right — 1s for high
and 0s for low
.
A Fatal Flaw and the Workaround
In the code just presented, we have a pretty reasonable way of managing the calculation that creates an arbitrarily long number. This implementation is good, but it isn't perfect.
We run into problems with this algorithm when the values of low
and high
start converging on a value of 0.5, but don't quite cross over the line that would cause bits to be shifted out. A sequence of calculations that runs into this problem might produce values like these:
low=7C99418B high=81A60145 low=7FF8F3E1 high=8003DFFA low=7FFFFC6F high=80000DF4 low=7FFFFFF6 high=80000001 low=7FFFFFFF high=80000000
Our algorithm isn't quite doing what we want here. The numbers are getting closer and closer together, but because neither has crossed over the 0.5 divider, no bits are getting shifted out. This process eventually leaves the values of the two numbers in a disastrous position.
Why is it disastrous? The initial calculation of range
is done by subtracting low
from high
. In this algorithm, range
stands in as a proxy for the number line. The subsequent calculation is intended to find the proper subsection of that value for a given character. For example, in the earlier sample, we might have wanted to encode a character that has a range of [0.22,0.23). If the value of range
is 100, then we can clearly see that the character will subdivide that with values of 22 and 23.
But what happens if low
and high
are so close that range
just has value of 1? Our algorithm breaks down — it doesn't matter what the probability of the next character is if we are going to try to subdivide the range [0,1) — we get the same result. And if we do that identically for every character, it means the decoder is not going to have any way to distinguish one character from another.
Ultimately, this can decay to the point where low == high
, which breaks one of our key invariants. It's easy to work out what kind of disaster results at that point.
Later on in this article, I'll go over some finer points in the algorithm that show why you run into trouble long before the two values are only 1 apart. We'll come up with specific requirements for the accuracy needed by the value of range
. Even without those detailed requirements, clearly something needs to be done here.
The fix to the algorithm is shown in the code below. This version of the algorithm still does the normal output of a single bit when low
goes above 0.5 or high
drops below it. But when the two values haven't converged, it adds a check of the next mostsignificant bits to see if we are headed towards the problem of nearconvergence. This will be the case when the two mostsignificant bits of high
are 10 and the two mostsignificant bits of low
are 01. When this is the case, we know that the two values are converging, but we don't yet know what the eventual output bit is going to be.
unsigned int high = 0xFFFFFFFFU; unsigned int low = 0; int pending_bits = 0; char c; while ( input >> c ) { int range = high  low + 1; prob p = model.getProbability(c); high = low + (range * p.upper)/p.denominator; low = low + (range * p.lower)/p.denominator; for ( ; ; ) { if ( high < 0x80000000U ) { output_bit_plus_pending( 0 ); low <<= 1; high << = 1; high = 1; } else if ( low >= 0x80000000U ) { output_bit_plus_pending( 1 ); low <<= 1; high << = 1; high = 1; } else if ( low >= 0x40000000 && high < 0xC0000000U ) pending_bits++; low << = 1; low &= 0x7FFFFFFF; high << = 1; high = 0x80000001; } else break; } } void output_bit_plus_pending(bool bit, int &pending_bits) { output_bit( bit ); while ( pending_bits ) output_bit( !bit ); }
So what do we do when we are in this nearconverged state? We know that sooner or later, either high
is going to go below 0.5 or low
will go above it. In the first case, both values will have leading bits of 01, and in the second, they will both have leading bits of 10. As this convergence increases, the leading bits will extend to either 01111... or 10000..., with some finite number of digits extended.
Given this, we know that once we figure out what the first binary digit in the string is going to be, the subsequent bits will all be the opposite. So in this new version of the algorithm, when we are in the nearconvergence state, we simply discard the second mostsignificant bit of high
and low
, shifting the remaining bits left, while retaining the MSB. Doing that means also incrementing the pending_bits
counter to acknowledge that we need to deal with it when convergence finally happens. An example of how this looks when squeezing out the converging bit in low
is shown in Figure 3. Removing the bit from high
is basically identical, but of course a 1 is shifted into the leastsignificant bit (LSB) during the process.
Figure 3: Squeezing out the converging bit.
The bit twiddling that makes this happen can be a bit difficult to follow, but the important thing is that the process has to adhere to the following rules:
 The low 30 bits of both
low
andhigh
are shifted left one position.  The LSB of
low
gets a 0 shifted in.  The LSB of
high
gets a 1 shifted in.  The MSB of both words is unchanged — after the operation, it will still be set to 1 for
high
and 0 forlow
.
The final change that ties all this together is the introduction of the new function output_bit_plus_pending()
. Each time that we manage this nearconvergence process, we are know that another bit has been stored away — and we won't know whether it is a 1 or a 0. We keep a count of all these consecutive bits in pending_bits
. When we finally reach a situation where an actual MSB can be output, we do it, plus all the pending bits that have been stored up. And of course, the pending bits will be the opposite of the bit being output.
This fixed up version of the code does everything we need to properly encode. The final working C++ code will have a few differences, but they are mostly just tweaks to help with flexibility. The code shown above is more or less the final product.
The Real Decoder
I've talked about some invariants that exist in this algorithm, but one I have skipped over is this: The values of high
and low
that are produced as each character is processed in the encoder will be duplicated in the decoder. These values operate in lockstep right down to the leastsignificant bit.
A consequence of this is that the code in the decoder ends up looking a lot like the code in the encoder. The manipulation of high
and low
is effectively duplicated. And in both the encoder and the decoder, the values of these two variables are manipulated using a calculated range
, along with the probability for the given character.
The difference between the two comes from how we get the probability. In the encoder, the character is known because we are reading it directly from the file being processed. In the decoder, have to determine the character by looking at value of the message we are decoding — where it falls on the [0,1) number line. It is the job of the model to figure this out in function getChar()
.
The compressed input is read into a variable named value
. This variable is another one of our pseudoinfinite variables, like high
and low
, with the primary difference being what is shifted into it in the LSB position. Recall that high
gets an infinite string of 1s shifted in, and low
gets an infinite string of 0s. value
gets something completely different — it has the bits from the encoded message shifted into. So at any given time, value
contains 32 of the long string of bits that represent the number that the encoder created. On the MSB side, bits that are no longer are used in the calculation are shifted out of value
; and on the LSB side, as those bits are shifted out, new bits of the message are shifted in.
The resulting code looks like this:
unsigned int high = 0xFFFFFFFFU; unsigned int low = 0; unsigned int value = 0; for ( int i = 0 ; i < 32 ; i++ ) { value <<= 1; value += m_input.get_bit() ? 1 : 0; } for ( ; ; ) { unsigned int range = high  low + 1; unsigned int count = ((value  low + 1) * m_model.getCount()  1 ) / range; int c; prob p = m_model.getChar( count, c ); if ( c == 256 ) break; m_output.putByte(c); high = low + (range*p.high)/p.count 1; low = low + (range*p.low)/p.count; for( ; ; ) { if ( low >= 0x80000000U  high < 0x80000000U ) { low <<= 1; high << = 1; high = 1; value += m_input.get_bit() ? 1 : 0; } else if ( low >= 0x40000000 && high < 0xC0000000U ) { low << = 1; low &= 0x7FFFFFFF; high << = 1; high = 0x80000001; value += m_input.get_bit() ? 1 : 0; } else break; } }
So this code looks very similar to the final encoder. The updating of the values is nearly identical — it adds the update of value
that updates in tandem with high
and low
. The nature of the algorithm introduces another invariant: value
will always be greater than or equal to low
and less than high
.
A Sample Implementation
All of this is put together in the production code included in ari.zip. The use of templates makes this code very flexible and should make it easy to plug it into your own applications. All the needed code is in header files, so project inclusion is simple.
In this section, I'll discuss the various components that need to be put together in order to actually do some compression. The code package has four programs:
fp_proto.cpp
, the floatingpoint prototype program. Useful for experimentation, but not for real work.compress.cpp
, which compresses a file using commandline arguments.
decompress.cpp
, which decompresses a file using commandline arguments. 
tester.cpp
, which puts a file through a compress/decompress cycle, tests for validity, and outputs the compression ratio.
The compression code is implemented entirely as a set of template classes in header files. These are:
compressor.h
, which completely implements the arithmetic compressor. The compressor class is parameterized on input stream type, output stream type, and model type.decompressor.h
, the corresponding arithmetic decompressor with the same type parameters.modelA.h
, a simple order0 model that does an acceptable job of demonstrating compression.model_metrics.h
, a utility class that helps a model class set up some types used by the compressor and decompressor.bitio.h
andbyteio.h
, the streaming classes that implement bitoriented I/O.
This article is going to skip over the details pertaining to the bitoriented I/O classes implemented in bitio.h
and byteio.h
. The classes provided will allow you to read or write from std::iostream
and FILE *
sources and destinations, and can be pretty easily modified for other types. Details on the implementation of these classes can be found in my article
C++ Generic Programming Meets OOP, which includes a bitoriented I/O class as an illustrative example.
All the code requires a C++11 compiler, but could be modified to work with earlier versions without much trouble. The makefile will build the code with g++ 4.6.3 or clang 3.4. The solution file included will build the projects with Visual C++ 2012.
model_metrics.h
In the code I've shown you so far, I blithely assume that your architecture uses 32bit math and efficiently supports unsigned integer math. This is somewhat of an unwanted and unneeded restriction, so my production code will define the math parameters using templates. The way it works is that the compressor and decompressor classes both take a MODEL
type parameter, and they rely on that type to provide a few typedefs and constants:
CODE_VALUE
he integer type used to perform math. In the sample code shown so far we assumed this wasunsigned int
, but signed and longer types are perfectly valid.MAX_CODE
The maximum value of a code: In other words, the highest value thathigh
can reach. If we are using all 32 bits of anunsigned int
, this would be 0xFFFFFFF, but as we will see shortly, this will normally be quite a bit smaller than the max that can fit into anint
orunsigned int
.FOURTH
,ONE_HALF
,THREE_FOURTHS
The three values used when testing for convergence. In the sample code, these were set to full 32bit values of 0x40000000, 0x80000000, and 0xC0000000, but they will generally be smaller than this. The values could be calculated fromMAX_CODE
, but we let the model define them for convenience.
struct prob
is the structure used to return probability values from the model. It is defined in the model because the model will know what types it wants the three values in this structure to be. The three values are low
, high
, and count
, which define the range of a given character being encoded or decoded.
The header file model_metrics.h
contains a utility class that helps the model class figure out what these types are. In general, the choice of CODE_VALUE
is going to be defined using the natural int or unsigned int type for your architecture. Calculating the value of MAX_CODE
requires a little bit more thinking though.
Digression: Overflow Avoidance It's a given that we are doing the math in the encoder and decoder using type CODE_VALUE range = high  low + 1; high = low + (range * p.high / p.count)  1; low = low + (range * p.low / p.count); The values of PRECISION >= CODE_VALUE_BITS + FREQUENCY_BITS On machines commonly in use today,

Digression: Underflow Avoidance In the decoder, we have the following code that is executed each time we decode a new character: CODE_VALUE range = high  low + 1; CODE_VALUE scaled_value = ((value  low + 1) * m_model.getCount()  1 ) / range; int c; prob p = m_model.getChar( scaled_value, c ); The value returned by Because of the invariants described earlier, we know that Now let's consider what happens when we are using 16 bits for our frequency counts. The largest value returned by In the worst case, we are dividing the range occupied by CODE_VALUE_BITS >= FREQUENCY_BITS + 2 If we have a 32 bit 