Extending Java Streams to Support Bit Streams

April 01, 1997
URL:http://www.drdobbs.com/extending-java-streams-to-support-bit-st/184410423

DDJ, Spring1997: Extending Java Streams

Laurence is a freelance software engineer and author. He can be contacted at [email protected].

The java.io package is Java's solution for shielding you from input/ output related device and platform dependencies. Of the 23 core java.io classes, 19 rely on the concept of a data stream (see Figure 1). A Java stream is simply a unidirectional, sequential flow of bytes that emanates from some concrete source (for input streams) or flows toward a concrete sink (for output streams). Input streams can only be read from, and output streams can only be written to.

In this article, I'll show how to subclass stream classes by designing and implementing I/O stream classes that support bit streams rather than byte streams. I'll then demonstrate these classes by deploying them in a Java implementation of the ubiquitous Lempel-Zif-Welch (LZW) data compression and decompression algorithm.

Java I/O Stream Class Basics

Two superclasses, InputStream and OutputStream, form the roots of the dual stream inheritance branches. Example 1 shows the definition of the InputStream class, a good summary of what all input streams need to be able to do.

Example 1: Abstract class InputStream interface.

public class InputStream extends Object {
   public abstract int read() throws IOException;
   public int read(byte b[]) throws IOException;
   public int read(byte b[], int off, int len) throws IOException;
   public long skip(long n) throws IOException;
   public int available() throws IOException;
   public void close() throws IOException;
   public synchronized void mark(int readlimit);
   public synchronized void reset() throws IOException;
   public boolean markSupported();
}

The main methods are the three overloaded read() variants that let you read a single byte or a block of bytes at a time. The last three methods—mark(), reset(), and markSupported()—allow subclasses of InputStream to implement a multibyte "unread" function. Clients can drop a mark in the input stream and revert back to it at a later time if the class in question supports the unread feature. Most don't because, by default, InputStream returns "false" when its markSupported() method is called, and most subclasses inherit markSupported() without overriding it. Class InputStream is abstract because of one single component that is left unimplemented: the building block read() method, on which the other read() methods and the skip() method rely.

OutputStream is equally straightforward. It mainly provides equivalent write() methods to send bytes to any output stream (see Example 2). The mark/reset mechanism is not present in OutputStream because it doesn't make any sense on output.

Example 2: Abstract class OutputStream interface.

public class OutputStream extends Object {
   public OutputStream();
   public abstract void write(int b) throws IOException;
   public void write(byte b[]) throws IOException;
   public void write(byte b[], int off, int len) throws IOException;
   public void flush() throws IOException;
   public void close() throws IOException;
}

Table 1 shows the data sources and sinks supported by the core input and output stream classes. In addition to these, classes in the java.net networking package produce input and output stream objects that connect to various network sources (like a TCP stream) and destinations. Unlike the input stream classes, the output stream classes do not support writing data to a String since Java Strings are read only. Furthermore, you can expect Sun's future Java API additions to support and rely on stream objects.

Table 1: Data sources and sinks supported by core input- and output.

Data Types    Input Class            Output Class

external files   FileInputStream           FileOutputStream
Strings          StringBufferInputStream 
byte arrays      ByteArrayInputStream      ByteArrayOutputStream
pipes            PipedInputStream          PipedOutputStream

You can further divide both the input and output branches of java.io's inheritance hierarchy into two types of classes:

Those that connect to specific data sources (for input streams) or data sinks (for output streams).
Those that enhance stream functionality.

In general, the constructor of a stream class is what allows you to specify the entity to which your stream object will connect. For example, objects of class FileInputStream accept data from a file, so FileInputStream constructors look like:

public FileInputStream(String name) throws 
  FileNotFoundException.

public FileInputStream(File file) throws
  FileNotFoundException.

public FileInputStream(FileDescriptor fdObj)

All three constructors let you specify a file with content that will be used to feed read requests to the input stream. On the output side of the java.io hierarchy, the same approach is used (ByteArrayOutputStream being an exception). Listing One shows an example Java program that uses class FileInputStream to calculate character frequencies.

The Filter I/O Classes

Things become more interesting when you look at the two subhierarchies rooted in classes FilterInputStream and FilterOutputStream. Example 3 shows FilterInputStream's definition.

Example 3: Class FilterInputStream interface.

public class FilterInputStream extends InputStream {
   protected InputStream in;
   protected FilterInputStream(InputStream in);
   public int read() throws IOException;
   public int read(byte b[]) throws IOException;
   public int read(byte b[], int off, int len) throws IOException;
   public long skip(long n) throws IOException;
   public int available() throws IOException;
   public void close() throws IOException;
   public synchronized void mark(int readlimit);
   public synchronized void reset() throws IOException;
   public boolean markSupported();
}

Class FilterInputStream looks almost exactly like InputStream. The crucial difference is the constructor's signature: The FilterInputStream constructor takes any other InputStream object as an argument. This simple difference (along with the protected instance variable in) allows you to chain input streams together to form input-processing pipelines of arbitrary complexity, all built from concrete, reusable instances of various FilterInputStream classes. The same powerful and flexible features are in the output-stream class hierarchy.

Figure 1: Package java.io inheritance hierarchy.

Diagram

Figure 2 illustrates a fictitious input-processing pipeline. In Step 1, unencoded, uncensored, and unformatted text originating from the Internet enters the pipeline. Step 2 buffers this data in large chunks, thereby enhancing the performance of the entire pipeline. Step 3 decodes the text into plain ASCII. Step 4 enforces the CDA and substitutes certain unspeakable words with sequences of Xs (I entirely support the Free Speech On-line campaign, so this example is purely tongue-in-cheek.) Finally, Step 5 reformats the text so it fits properly on cheap, low-resolution displays hooked up to the NCs of the future.

Figure 2 shows the general raison d'être for all filter stream subclasses—enhancing stream functionality.

Figure 2: A fictitious input- processing pipeline.

Diagram

Chaining Filter Stream Objects

The sequence of stream object constructor invocations determines the structure of your pipelines. You can significantly improve the performance of the CharFreq program in Listing One by turning the single-stage FileInputStream "pipeline" into a two-stage, buffering pipeline. Simply replace the line constructing the FileInputStream object with file = new BufferedInputStream(new FileInputStream(args[0]) );. Note that this change does not affect any other parts of the program. Although the main loop is now invoking read() on a BufferedInputStream object, the program's function is the same. The chaining mechanism is limitless; you could "wrap" the above BufferedInputStream object with yet another type of input stream, thus creating a three-stage input pipeline, and so on, ad infinitum. This technique is as idiomatic within Java as using the value of assignments is in C or C++ (if ((ch=getch()) == LF), for example).

Chained input or output pipelines frequently consist of three or more stream objects, so the aforementioned syntax can quickly become unmanageable. The preferred way of coding your chains is to use one constructor per line and declare variables to hold every object; see the coding template in Example 4.

Example 4: Preferred method for coding input and output chains.

<StreamType> varNameSource = new <StreamType>( concrete source );
<StreamType> varNameLink1  = new <StreamType>( varNameSource);
<StreamType> varNameLink2  = new <StreamType>( varNameLink1);
         :                         :
<StreamType> varNameLinkN-1  = new <StreamType>( varNameLinkN-2);
<StreamType> varNameLinkN  = new <StreamType>( varNameLinkN-1);

Reading and Writing Arbitrary Bit Streams

BitStreamInputStream and BitStreamOutputStream are subclasses of the filter stream classes. Our requirements for these bit-stream classes (see Listings Two and Three) are:

Support bitfields ranging in length from 1 to 32 bits inclusive.
Bitfields are aligned on their natural bitfield boundary (byte alignment is not special).
The current bitfield size can be altered at any point during reading or writing.

Supporting 1- to 32-bit bitfields suggests two new methods: readBitField() for the input class, and writeBitField() for the output class. These two methods are the logical analogs of read() and write().

On the input side, the return mechanism for readBitField() needs to use the same trick as read() as defined in InputStream. read() can either return a single byte or the integer 1 (which represents EOF). Since the integer value of 1 is equivalent to the byte value 255, read() returns a 32-bit int rather than an 8-bit byte. For the sake of consistency, but not elegance, I use the same approach, returning 64-bit long to allow 32-bit bitfields and 1 for EOF. It would be much cleaner to use the Java exception mechanism to trap EOF. This technique is, in fact, used by class DataInputStream, but this is an exception (no pun intended).

Aligning bitfields on their natural bitfield boundaries is handled transparently: No methods are required to support arbitrary bitfield alignment. Altering the bitfield size suggests that the method setBitFieldSize(int bfSize) is present in both classes. Examples 5 and 6 show the interface definitions for these two classes.

Example 5: Class BitStreamInputStream interface.

public class BitStreamInputStream extends FilterInputStream {
   public BitStreamInputStream(InputStream in)
   public BitStreamInputStream(InputStream in, int bitFieldSize)
   public long readBitField() throws IOException
   public void setBitFieldSize(int bits) throws IllegalArgumentException
   public int getBitFieldSize()
}

Example 6: Class BitStreamOutputStream interface.

public class BitStreamOutputStream {
   public BitStreamOutputStream(OutputStream out)
   public BitStreamOutputStream(OutputStream out, int bitFieldSize)
   public long writeBitField() throws IOException
   public void setBitFieldSize(int bits) throws IllegalArgumentException
   public int getBitFieldSize()
}

The getBitFieldSize() method is provided for completeness and symmetry. Both classes provide two constructor options. The simpler constructor creates streams supporting 8-bit bitfields. In other words, it produces an everyday byte-aligned byte stream by default. The second and (more useful) constructor allows you to explicitly specify the initial bitfield size for the stream.

Since I built my bit stream classes on top of byte stream classes (and not vice versa), I need to pack and align variable-sized bitfields into bytes for writing and unpack and align these same fields for reading.

To implement method BitStreamInputStream.readBitField(), I chose to use a single-byte buffer sitting between the stream and the bitfield assembly code. While this might be less efficient than using a larger buffer (say, 8 bytes, capable of holding the largest bitfield—32 bits—under all alignment circumstances), it greatly simplifies implementation and EOF handling. The basic structure of the algorithm is shown in Example 7 in pseudocode.

Example 7: Pseudocode for BitStreamInputStream.readBitField().

bitField = 0
bitsToRead = current field size
while bitsToRead > 0
    if read buffer is empty, re-fill byte read buffer (and possibly
      generate EOF here)
    extract largest possible subfield (possibly entire bitfield requested)
    gradually build bitField up by adding subfield to bitField (using
      shifts and an OR)
    decrement bookkeeping variable tracking the number of unread bits
      in buffer (bitsInCache)
    decrement bitsToRead by the number of bits extracted in this iteration
return bitField

Instead of using an explicit bitstream pointer (the equivalent of a filepointer for byte streams) to track which bits have already been consumed, the code keeps the next bitfield left-aligned in the read buffer. Bitfield fragments (called "partials" or "subfields" in the code) are extracted from the read buffer and right-aligned before being shifted left the necessary number of bits and ORed into the bitfield construction variable (bitField).

The choice of alignment directions can be reversed without affecting the basic algorithm, although this would mean more than changing the few << and >> operators. The algorithm and code are rather hard to understand because of their generic and very low-level character: Any size bitfield (from 1 to 32 bits inclusive), aligned to any bit boundary should be fair game to the algorithm. To convince yourself of the correctness of the algorithm, you should try pen-and-paper simulations using the following test cases:

Bitfields of length 1 bit (repeat nine times to convince yourself byte boundaries are handled properly).
Bitfields of length 2, 3, 4, 5, 7, 8, 9, 15, 16, 17, 31, 32 bits.

The bitfield-writing algorithm requires bit-shuffling operations similar to those in the reading algorithm, except it will be the write buffer (again a single byte) that receives bitfield fragments. These need to be aligned to the bitstream's current bit alignment. Whenever the write buffer contains eight bits of data, the buffer is flushed to make room for more bitfield data (either whole bitfields, if the current bitfield size is between 1 and 8, or parts of bitfields if the size is between 9 and 32).

While the bitfield-reading algorithm had to deal with encountering and passing an EOF, the writing algorithm has to ensure that all bitfield data is written when the stream is closed. This is done by overriding the close() method: When the BitStreamOutputStream is closed, it checks if there are any unwritten bits in the write buffer, and flushes them if they exist.

A Question of Semantics

BitStreamInputStream and BitStreamOutputStream both inherit the read() and write() methods from their respective filter stream superclasses. Since these and other inherited methods know nothing about the new bit-stream model, and could corrupt the stream (by skipping or not writing bits), I had to reimplement these methods.

When subclassing a class, you should pay careful attention to the new semantics of your new class to ensure that these are as logical and as close to the original as possible. When designing my bit-stream classes, I had to decide the exact semantics for these same read() and write() methods in my extended classes. For the read() method, my options were:

Read a bitfield of the current size (exactly like readBitField()).
Read a byte (normal byte alignment).
Read an 8-bit bitfield (arbitrarily aligned).

While the first option is elegant, I rejected it because it deviates too much from the byte bias of the java.io classes. The read() and write() methods of all the java.io classes implicitly deal with 8-bit bytes. The second option can discard data depending on the alignment of the last bitfield and/or its size, so that option is rejected too. I chose reading an arbitrarily aligned 8-bit bitfield as the new behavior for read(). Symmetry forced me to use analogous semantics for the write() methods. Similar reasoning led me to decide that skip() shall skip multiples of 8 bits (and not an arbitrary number of bits).

Using the Bit-Stream Classes: Implementing LZW Compression

One common application of reading and writing bitfields is data compression. As a real-life test suite for the bit-stream classes, I present a straightforward implementation of the LZW data compression and decompression algorithm, as explained by Mark Nelson ("LZW Data Compression," Dr. Dobb's Journal, October 1989), and further refined by Shawn M. Regan ("LZW Revisited," Dr. Dobb's Journal, June 1990).

I ignored Mark's C source code and used his pseudocode listings to implement the compress() and decompress() methods in class LZW. At one point, Mark refers to having the "implementation blues." However, using my bit-stream classes and the Hashtable class in Java's java.util package, I found coding to be refreshingly simple.

As Shawn clearly explained in his article, one optimization of the LZW algorithm is to use dynamic code sizing (that is, bitfield sizes). While both authors found this to be an additional obstacle (which Shawn tackled), I found that, because of the bit-stream classes' dynamic bitfield size capability coupled with class Hashtable's unlimited capacity, the Java implementation did not share these obstacles.

The design of the LZW class (Listing Four) uses a helper class, LZWDictionary (Listing Five), to encapsulate the core LZW data structure: a two-way code-to-string/ string-to-code look-up structure. While the previous articles on LZW used fixed-sized arrays (in C) to implement this functionality, I simply use two Hashtable objects—one per look-up direction. This implementation, therefore, does not have any "table full" conditions: class java .util.Hashtable transparently grows whenever necessary.

Java Performance Blues

You might wonder why I didn't create even more filter stream subclasses that encapsulated the LZW algorithm. This would most definitely result in an elegant set of compressing and decompressing stream classes. However, the LZWClient demonstration program (Listing Six) clearly suffers from lack of speed. Even on my vanilla Pentium/100 using the vanilla Sun JDK, it was obvious that Java was being stretched to its limits by this implementation of the LZW algorithm. Casting the algorithm further in the powerful (but inefficient) harness of a filter stream subclass would cripple the code beyond rescue.

Do the new bit-stream classes cause the bottleneck? If you run the LZWClient program with the prof JVM option, then study the resulting profiling report generated in the file java.prof, you'll see that the bit-stream classes and their expensive-looking bit manipulation operations are not the cause. The top cycle guzzlers are Hashtable's containsKey() and put(), and StringBuffer's append() and toString(). Class StringBuffer's involvement without my program's explicit usage of this class might seem surprising. The explanation lies with the behind-the-scenes reliance of Java compilers on class StringBuffer to translate simple string concatenation expressions like stringA + stringB into((newStringBuffer(stringA)).append(stringB)).toString(). Method LZW.compress() does, indeed, use the concatenation operator ('+') to append every newly read source character to the current string, hence the numerous append() and toString() invocations. Because the performance hot spots are not located in the bit-stream classes, I leave the optimization challenge (and fun) to you.

Conclusion

Subclassing Java's standard I/O classes is straightforward. It allows you to create I/O pipelines that incorporate your own filters and enhancers. The elegance with which both operations can be performed is testimony to Sun's enlightened investment in design rather than implementation. The only current annoyance is Java's overall lackluster run-time performance. Luckily, this is only a temporary blemish on the language: Sun's hardware implementations of the Java Virtual Machine architecture will soon be incorporated into PC hardware accelerator cards, finally silencing the performance issue and allowing Java to take the throne.

Listings

DDJ, Spring1997: Listings

Extending Java Streams to Support Bit Streams

by Laurence Vanhelsuwé

Listing One

import java.io.*;

public class CharFreq {

public static void main (String[] args) {
InputStream file = null;
int frequencies[] = new int[256];
int ch;

    if (args.length == 0) {
        System.out.println("Usage: java CharFreq <file>");
        System.exit(10);
    }
    try {
        file = new FileInputStream(args[0]) );
    } catch (FileNotFoundException unknownFile) {
        System.out.println("'" + args[0] + "' could not be found.");
        System.exit(100);
    }
    try {
        while ( (ch = file.read()) != -1 ) {
            frequencies[ ch ]++;
        }
    } catch (IOException error) {
        System.out.println("An error occurred during reading: " + error);
        System.exit(100);
    }
    try {
        file.close();
    } catch (IOException ignored) {}
}
} // End of Class CharFreq

Listing Two

//---------------------------
// Class BitStreamInputStream
// --------------------------
// Implements an enhanced InputStream which allows you to read a
// stream of bit fields ranging in size from 1 bit (a true bit
// stream) to 32 bits (a stream of integers). The size of the current
// bitfield can be changed at any point while reading the stream.
// (c) Laurence Vanhelsuwe 1996. E-Mail: [email protected]
//------------------------------------------------------------------

import java.io.*;
public class BitStreamInputStream extends FilterInputStream {
final static int EIGHT = 8; // 8 bits per byte
protected short buffer; // our BYTE bitstream read-ahead
                        // buffer declared as short to cope with EOF
protected int bitsInCache;  // how many unread bits left in our byte
protected int fieldSize;    // current size of bitstream read fields

//------------------------------------------------------------------
public BitStreamInputStream(InputStream in) {

    this(in, EIGHT);       // default to a normal byte stream
}
//------------------------------------------------------------------
public BitStreamInputStream(InputStream in, int bitFieldSize) {
    super(in);
    setBitFieldSize(bitFieldSize);
    bitsInCache = 0;        // we haven't got any cached bits
}
//------------------------------------------------------------------
// Set the current bitfield size.
//------------------------------------------------------------------
public void setBitFieldSize(int bits) throws IllegalArgumentException {
    if (bits>32 || bits<1) throw new IllegalArgumentException
        (
 "BitField size ("+ bits + ") no good. Has to be between 1 and 32."
        );
    this.fieldSize = bits;
}
//------------------------------------------------------------------
public int getBitFieldSize() {
    return this.fieldSize;
}
//------------------------------------------------------------------
// Read a bitfield from the input stream. The number of bits read is
// the current bitfield length. Bitfield can be on arbitrary bit boundaries.
//------------------------------------------------------------------
public long readBitField() throws IOException {

int bitField;               // what we're going to return to caller
int bitsToRead;             // remaining bits to assemble into BF
int availableNumberOfBits;
int OR_position;
int rightAlignedBFPartial;
    bitField    = 0;        // start with empty jigsaw
    bitsToRead  = fieldSize;
    OR_position = fieldSize;

    while (bitsToRead > 0) {
        if (bitsInCache == 0) {
            if ( (buffer = (short) in.read()) == -1) {
                return -1;          // reached EOF
            }
            bitsInCache = EIGHT;    // we've got a full byte again
        }
        availableNumberOfBits = Math.min( bitsToRead, bitsInCache );
        rightAlignedBFPartial = buffer >> (EIGHT - availableNumberOfBits);
        // always keep next partial left aligned and clean
        buffer <<= availableNumberOfBits;
        buffer &= 255;
        OR_position -= availableNumberOfBits;
        // add bitfield subfield
        bitField |= rightAlignedBFPartial << OR_position;
        // track # of cached bits
        // track how much left to do

        bitsInCache -= availableNumberOfBits;
        bitsToRead  -= availableNumberOfBits;
    }
    return bitField;
}
        //---------------------------------------------
        // The remaining methods are methods we override
        // from our parent class: FilterInputStream
        //---------------------------------------------
//------------------------------------------------------------------
// Overridden read() still reads a byte, but on any bit boundary.
//------------------------------------------------------------------
public int read() throws IOException {
int previousBFSize;
int theByte;
    previousBFSize = getBitFieldSize();
    setBitFieldSize( EIGHT );
    try {
        theByte = (int) readBitField();
    }
    finally {
        setBitFieldSize( previousBFSize );
    }
    return theByte;
}
//------------------------------------------------------------------
// Override block read() methods to use basic read() as building block.
// The implementation we want for this read() is the same as that
// for class InputStream.
// According to the Java language spec, two elegant (and short
// solution should work:
//   ((InputStream) this).read(..)      // i.e. a cast
//   InputStream.read(..)               // i.e. fully specifiying
// Unfortunately neither work, so I am forced to paste in the
// original code for InputStream.read(byte b[], int off, int len).
//------------------------------------------------------------------

public int read(byte b[], int off, int len) throws IOException {

    if (len <= 0) {
        return 0;
    }
    int c = read();
    if (c == -1) {
        return -1;
    }
    b[off] = (byte)c;
    
    int i = 1;
    try {
        for (; i < len ; i++) {
            c = read();
            if (c == -1) {
                break;
            }
            if (b != null) {
                b[off + i] = (byte)c;
            }
        }
    } catch (IOException ee) {}
    return i;
}
//------------------------------------------------------------------
// Overridden FilterInputStream.read(byte b[])
//------------------------------------------------------------------
public int read(byte b[]) throws IOException {
    return read(b, 0, b.length);
}
//------------------------------------------------------------------
// Overridden FilterInputStream.skip(long n)
// If any client relies heavily on skipping multi-byte strings in
// the bitstream, then this method has to be re-implemented to be more
// efficient. Current implementation is functional but highly inefficient.
//------------------------------------------------------------------
public long skip(long n) throws IOException {
long i;

    for(i=0; i < n; i++) {
        if (read() == -1) break;
    }
    return i;
}
} // End of Class BitStreamInputStream

Listing Three

//----------------------------
// Class BitStreamOutputStream
// ---------------------------
// Implements an enhanced OutputStream which allows you to write a
// stream of bit fields ranging in size from 1 bit (a true bit
// stream) to 32 bits (a stream of integers). The size of the current
// bitfield can be changed at any point while writing the stream.
// (c) Laurence Vanhelsuwe 1996. E-Mail: [email protected]
//------------------------------------------------------------------

import java.io.*;

public class BitStreamOutputStream extends FilterOutputStream {

final static int EIGHT = 8;     // 8 bits per byte

protected short buffer;         // our BYTE bitstream write buffer
protected int bitsInCache;      // how many cached bits in our byte
protected int fieldSize;        // current size of bitstream fields
protected long maxFieldValue;   // max value that fits bitfield
//------------------------------------------------------------------

public BitStreamOutputStream(OutputStream out) {
    this(out, EIGHT);           // default to a normal byte stream
}
//------------------------------------------------------------------
public BitStreamOutputStream(OutputStream out, int bitFieldSize) {
    super(out);                 // call FilterOutputStream constr.
    setBitFieldSize(bitFieldSize);
    bitsInCache = 0;            // we haven't got any cached bits
    buffer      = 0;            // start w/ clean buffer (for ORs !)
}
//------------------------------------------------------------------
public void setBitFieldSize(int bits) throws IllegalArgumentException {
    if (bits>32 || bits<1) throw new IllegalArgumentException
        (
 "BitField size ("+ bits + ") no good. Has to be between 1 and 32."
        );
    this.fieldSize = bits;
    this.maxFieldValue = (1L << bits) - 1;  // precalc max bf value
}
//------------------------------------------------------------------
public int getBitFieldSize() {
    return this.fieldSize;
}
//------------------------------------------------------------------
// Write a bitfield to the output stream. The number of bits written is
// the current bitfield length. Bitfield can be on arbitrary bit boundaries.
//------------------------------------------------------------------
public void writeBitField(int bf) throws IOException {

int bitsToWrite;            // how many bits left to write
int capacity;               // how many bits fit in write buffer
int partial, partialSize;   // partial bitfield and its size in bits
int bfExtractPos;           // bitfield extract position (bit number)

        // check that bitfield fits in current bitfield size
    if (bf > maxFieldValue ) {
        throw new IllegalArgumentException
            (
     "Can not pack bitfield " + bf + " in " + fieldSize + " bits."
            );
    }
    bitsToWrite  = fieldSize;
    bfExtractPos = fieldSize;
        // a single bitfield might have to be written out in several
        // passes since the lot has to pass through the single byte
        // write buffer. This inefficient situation is a result of
        // the complex aligning required to append any bitfield to
        // the currently written stream.
    while (bitsToWrite > 0) {
        if (bitsInCache != EIGHT) {         // if capacity left
            capacity = EIGHT - bitsInCache; // in write buffer...

            partialSize = Math.min( bitsToWrite, capacity);
            bfExtractPos -= partialSize;

            partial = extract (bf, partialSize, bfExtractPos);
            buffer |= partial << (capacity - partialSize);
            bitsToWrite -= partialSize;
            bitsInCache += partialSize;
        }
        if (bitsInCache == EIGHT) {         // if write buffer full,
            out.write((int) buffer);        // send it on its way
            bitsInCache = 0;                // and continue with
            buffer      = 0;                // clean buffer
        }
    }
}
//------------------------------------------------------------------
// extract a bitfield of length 'bits' from an integer source.
// bitfield starts at bit 'pos' and is returned right-aligned to bitpos 0
//------------------------------------------------------------------
private int extract (int source, int bits, int pos) {

    source = source >> pos;     // align bitfield to bit 0
    int mask = ~( (-1) << bits);// create a mask to get clean bitfld
    return source & mask;       // return bitfield (0 bits padded)
}
        //---------------------------------------------
        // The remaining methods are methods we override from
        // our parent class: FilterOutputStream
        //---------------------------------------------
//------------------------------------------------------------------
// Override write() method to write a byte on any bit boundary.
//------------------------------------------------------------------
public void write(int b) throws IOException {
int previousBFSize;

    previousBFSize = getBitFieldSize();
    setBitFieldSize(EIGHT);
    try {
        writeBitField( b & 0xFF );
    }
    // if writeBitField threw an exception, make sure we reset
    // the current bitfield size before letting the exception thru
    finally {
        setBitFieldSize(previousBFSize);
    }
}

//------------------------------------------------------------------
// Override block write() methods to use basic write() as building block.
//------------------------------------------------------------------
public void write(byte b[], int off, int len) throws IOException {
    for (int i = 0 ; i < len ; i++) {
        write(b[off + i]);
    }
}
public void write(byte b[]) throws IOException {
   write(b, 0, b.length);
}

//------------------------------------------------------------------
// Override flush() method.
//------------------------------------------------------------------
public void flush() throws IOException {

    // REFUSE to flush as this advances file pointer of underlying stream !
}    
//------------------------------------------------------------------
// Override close() method to correctly flush any remaining bitfields
// in write buffer before closing output chain.
//------------------------------------------------------------------
public void close() throws IOException {
    if (bitsInCache != 0) {
        out.write((int) buffer);
    }
    out.flush();
    out.close();
}    
} // End of class BitStreamOutputStream

Listing Four

//-----------------------------
// Class LZW (Lempel-Zif-Welch)
// ----------------------------
// This Java implementation of the LZW algorithm demonstrates the
// use of the BitStreamInputStream and BitStreamOutputStream classes.
// (c) Laurence Vanhelsuwe 1996  E-Mail: [email protected]
//------------------------------------------------------------------

import java.io.*;

//------------------------------------------------------------------
public class LZW {

protected LZWDictionary LZWdict;            // LZW look-up object
protected       int bitFieldSize;           // current bitfield size
protected final int STARTING_BF_SIZE = 9;

//------------------------------------------------------------------
// LZW compressor. For an explanation of the algorithm, see DDJ
// October 1989.
//------------------------------------------------------------------
public boolean compress(InputStream plain, OutputStream compressed) {

BufferedInputStream     b_in;   // these buffering streams are there
BufferedOutputStream    b_out;  // purely to enhance I/O performance

BitStreamOutputStream   out;    // the output stream for LZW data

String  string, longerString;
int     ch;
    bitFieldSize = STARTING_BF_SIZE;


    b_in    = new BufferedInputStream(plain);
    b_out   = new BufferedOutputStream(compressed);

    out     = new BitStreamOutputStream(b_out, bitFieldSize);

    LZWdict = new LZWDictionary();

    // STRING = get input character
    try {
        char initialChar = (char) b_in.read();
        string = String.valueOf( initialChar );
    
        // WHILE there are still input characters DO
        //   CHARACTER = get input character
        while ( (ch = b_in.read()) != -1) {
            longerString =  string + (char) ch;
            // IF STRING+CHARACTER is in the string table THEN
            if ( LZWdict.encountered(longerString) ) {
                // STRING = STRING+character
                string = longerString;
            } else {
                // output the code for STRING
                // add STRING+CHARACTER to the string table
                // STRING = CHARACTER
                int code = LZWdict.stringCode(string);
                try {
                    out.writeBitField( LZWdict.stringCode(string) );
                } catch (IllegalArgumentException bitFieldWontFit) {
                    out.writeBitField( LZWDictionary.GROW_CODE );
                    out.setBitFieldSize( ++bitFieldSize );
                    out.writeBitField( LZWdict.stringCode(string) );
                }
                LZWdict.add(longerString);
                string = String.valueOf( (char) ch);
            }
        }
    // output the code for STRING
    out.writeBitField( LZWdict.stringCode(string) );
    } catch (Exception e) {
        System.out.println("Exception: " + e);
    }
    try {
        out.close();
    } catch (Exception ignored) {}
    return true;
}
//------------------------------------------------------------------
// LZW decompressor. For an explanation of the algorithm, see DDJ
// October 1989.
//------------------------------------------------------------------
public boolean decompress(InputStream compressedByteStream,
                         OutputStream decompressed) {
BufferedInputStream     b_in;

BufferedOutputStream    b_out;
BitStreamInputStream    in;
DataOutputStream        out;

long                    code;
int                     oldCode, newCode;
String                  string;
char                    firstChar;

    bitFieldSize = STARTING_BF_SIZE;

    b_in    = new BufferedInputStream(compressedByteStream);
    b_out   = new BufferedOutputStream(decompressed);

    in      = new BitStreamInputStream(b_in, bitFieldSize);
    out     = new DataOutputStream(b_out);
    LZWdict = new LZWDictionary();

    try {

        // Read OLD_CODE
        // output OLD_CODE
    
        oldCode = (int) in.readBitField();
        out.write(oldCode);
        firstChar = (char) oldCode;

        // WHILE there are still input characters DO
        while( (code = in.readBitField()) != -1) {
            // Read NEW_CODE
            newCode = (int) code;
            switch (newCode) {
                case LZWDictionary.GROW_CODE : 
                    in.setBitFieldSize ( ++bitFieldSize );
                    continue;       // carry on with while()
            }
            // STRING = get translation of NEW_CODE
            string = LZWdict.codeString(newCode);
            if (string == null) {
                String prevString = LZWdict.codeString(oldCode);
                string = prevString + firstChar;
            }
            // output STRING
            out.writeBytes(string);
    
            // CHARACTER = first character in STRING
            firstChar = string.charAt(0);
    
            // add OLD_CODE + CHARACTER to the translation table
            String prevString = LZWdict.codeString(oldCode);
            String compressedString = prevString + firstChar;
            LZWdict.add(compressedString);
  
            // OLD_CODE = NEW_CODE

            oldCode = newCode;
    
        } // End of while

    } catch (Exception e) {
        System.out.println("Exception: " + e);
    }
    try {
        out.close();
    } catch (Exception ignored) {}
    return true;
}} // End of Class LZW (Lempel-Zif-Welch)

Listing Five

//--------------------
// Class LZWDictionary
// -------------------
// This class encapsulates the core LZW string/compression code
// associative data structure and its ADT manipulation methods.
// (c) Laurence Vanhelsuwe 1996    email: [email protected]
//------------------------------------------------------------------

import java.util.*;     // mainly for Hashtable

//------------------------------------------------------------------
class LZWDictionary {25


private final static int ASCII_SET_SIZE = 256;
private final static int COMMAND_CODES  = 1;
private final static int FIRST_CODE     = ASCII_SET_SIZE + COMMAND_CODES;

// List of command codes:
public final static int GROW_CODE = FIRST_CODE - 1;

protected Hashtable string2codeLUT;
protected Hashtable code2stringLUT;
protected int       nextCode;

/------------------------------------------------------------------
// LZWDictionary constructor
// 1) create the two parallel string/code look-up tables
// 2) initialize them to contain codes for every 8-bit ASCII char
// 3) init compression code counter
//------------------------------------------------------------------
public LZWDictionary () {

String string;      // the look-up table key
Integer code;       // the look-up table value
    string2codeLUT  = new Hashtable();
    code2stringLUT  = new Hashtable();
    for (int charCode=0; charCode < ASCII_SET_SIZE; charCode++) {
        string = String.valueOf( (char) charCode );
        code   = new Integer(charCode);
        string2codeLUT.put(string, code);
        code2stringLUT.put(code, string);
    }
    nextCode = FIRST_CODE;
}
//------------------------------------------------------------------
// Boolean function to determine whether a string sequence has
// already been registered (and can therefore be compressed).
//------------------------------------------------------------------
public boolean encountered(String string) {
    return string2codeLUT.containsKey(string);
}
//------------------------------------------------------------------
// A new string needs to be recorded in the LZW dictionary.
// The new string becomes associated with a unique compression code.
// To be able to look up the code for a string and vice versa, two
// Hashtable objects (containing essentially the same data) are
// maintained and kept in sync.
//------------------------------------------------------------------
public void add(String string) {
    Integer code = new Integer(nextCode);
    string2codeLUT.put(string, code);
    code2stringLUT.put( code, string);
    nextCode++;
}
//------------------------------------------------------------------
// Look up the compression code for a given string
//------------------------------------------------------------------
public int stringCode(String string) {
Integer code;
    code = (Integer) string2codeLUT.get(string);
    return code.intValue();
}
//------------------------------------------------------------------
// Look up a string associated with a given compression code
//------------------------------------------------------------------
public String codeString(int strCode) {
Integer code;
    code = new Integer(strCode);
    return (String) code2stringLUT.get(code);
}} // End of Class LZWDictionary

Listing Six

//----------
// LZWClient
// ---------
// Demonstration LZW (de)compressing utility which relies on the LZW
// classes, which in turn build on the BitStream I/O classes.
// (c) Laurence Vanhelsuwe 1996. E-Mail: [email protected]
//------------------------------------------------------------------

import java.io.*;
import java.util.*;


public class LZWClient {

private final static int ACTION_INVALID    = 0;
private final static int ACTION_COMPRESS   = 1;
private final static int ACTION_DECOMPRESS = 2;
private final static String usage = 
       "Usage: java LZW <COMPRESS|DECOMPRESS> <file>";
public static void main (String[] args) {
int action = ACTION_INVALID;

    // need an action and filename as command line arguments
    if (args.length != 2) {
        System.out.println(usage);
        System.exit(10);
    }
    // validate requested action
    if (args[0].equals("COMPRESS")) {
        action = ACTION_COMPRESS;
    } else
    if (args[0].equals("DECOMPRESS")) {
        action = ACTION_DECOMPRESS;
    }
    switch (action) {
        case ACTION_COMPRESS:
            compressFile(args[1]); break;
        case ACTION_DECOMPRESS:
            decompressFile(args[1]); break;
        case ACTION_INVALID:
            System.out.println("Invalid action requested: " + args[0]);
            System.out.println(usage);
            System.exit(20);
            break;                  // for consistency
    }
    System.out.println("LZWClient Done.");
}
//------------------------------------------------------------------
// Compress a file to "compressed.out"
//------------------------------------------------------------------
protected static void compressFile( String filename ) {

InputStream  inputFile      = null;
OutputStream compressedFile = null;
int originalSize, compressedSize;
LZW lzw;
    try {
        inputFile = new FileInputStream( filename );

        originalSize = fileLength( filename );
    
        lzw = new LZW();
    
        System.out.print("Compressing.."); System.out.flush();
        compressedFile  = new FileOutputStream("compressed.out");
    
        Date timer = new Date();
        lzw.compress(inputFile, compressedFile);
    
        compressedSize  = fileLength("compressed.out");
        printCPS(timer, originalSize);
    
        printEfficiency(originalSize, compressedSize);

    } catch (FileNotFoundException badFile) {
        System.err.println("'" + filename + "' could not be found.");
        System.exit(10);
    } catch (IOException ioErr) {
        System.out.println("IO Error: " + ioErr);
        System.exit(10);
    }
}
//------------------------------------------------------------------
// Decompress a file to "original.out"
//------------------------------------------------------------------
protected static void decompressFile( String filename ) {

InputStream  inputFile    = null;
OutputStream restoredFile = null;
int originalSize, compressedSize;
LZW lzw;
    try {
        inputFile = new FileInputStream( filename );

        lzw = new LZW();
    
        System.out.print("Decompressing.."); System.out.flush();
        restoredFile = new FileOutputStream("original.out");
    
        Date timer = new Date();
        lzw.decompress(inputFile, restoredFile);
    
        originalSize = fileLength("original.out");
        printCPS(timer, originalSize);

    } catch (FileNotFoundException badFile) {
        System.err.println("'" + filename + "' could not be found.");
        System.exit(10);
    } catch (IOException ioErr) {
        System.out.println("IO Error: " + ioErr);
        System.exit(10);
    }
}
//------------------------------------------------------------------
// Utility function to determine a file's size
//------------------------------------------------------------------
public static int fileLength (String fileName) {
File f;
    f = new File(fileName);
    return (int) f.length();
}
//------------------------------------------------------------------
// Determine and print how well we've compressed a file
//------------------------------------------------------------------
public static void printEfficiency(int original, int compressed) {
double percent;
    percent         = 100.0 * ((double)compressed)/(double)original;
    System.out.println("Original size   : " + original);
    System.out.println("Compressed size : " + compressed);
    System.out.println("% of original   : " + percent + "%");
}
//------------------------------------------------------------------
// Determine and print how fast we've (de)compressed a file
//------------------------------------------------------------------
public static void printCPS( Date then, int numBytes) {
double ms, cps;
    ms  = (double) ( (new Date()).getTime() - then.getTime() );
    cps = 1000.0 * ((double) numBytes) / ms;
    System.out.println(" (at " + cps + " bytes per second)");
}
} // End of Class LZWClient

End Listings