Image Compression and Decompression
Image and video encoders and decoders, in software called codecs, are intended to compress their media for storage or transmission. Raw images are quite large and raw digital video is unreasonably large, absorbing disk or network capacity very quickly. Moreover, processor speed is sufficient that working with these media uncompressed, except for capture and display, is completely unnecessary and inefficient. It is faster to read compressed video from disk and decompress it than it would be to read uncompressed video.
Most compression is based on taking advantage of redundancy and predictability in data to reduce the number of bytes necessary to represent it. Two common techniques are run-length coding, which converts runs of data into run-lengths and values, and variable-length coding, which converts data of fixed bit lengths into variable bit lengths according to popularity. Huffman coding and arithmetic coding are examples of variable-length coding.
Another source of compression is perceptibility. For some kinds of data, such as text and binary executables, compression must be lossless. A compression method that sometimes changed an "a" to an "A" would not be acceptable. Standalone Huffman coding is exactly reversible. However, it is possible to compress media information in a way that is not exactly reversible but is virtually undetectable. Such methods are called lossy, meaning that the output is not guaranteed to be exactly the same as the input. However, in many cases the loss can be imperceptible or have acceptable visual effect. Just as with audio coding, the compression algorithm transforms the data into spaces in which information can be removed while minimizing the perceptible impact to the media.
Most media compression is done using transform-based coding methods. Such methods convert the position-based information into frequency-based or position/frequency-based information. The compression benefit is that important information becomes concentrated in fewer values. Then the coder can represent the more-important information with more bits and the less-important information with fewer bits. The perception model dictates the importance of information, but generally higher-frequency information is considered less important. Figure 1 shows the framework of a transform-based encoding and decoding scheme.
JPEG is a widely used standard for image compression. The term is an acronym that stands for the Joint Photographic Experts Group, the body that designed the specification. JPEG is a powerful and effective compression technique that has been around for over a decade. It has gained popularity because of its effectiveness, because it is an international standard, and because a baseline JPEG codec can be built without using proprietary algorithms.
The Independent JPEG Group (IJG) is a small association that wrote a reference implementation of the JPEG standard. This implementation, often called the IJG library or merely IJG, is widely used as a JPEG codec.
Data Compression Principles
This section summarizes key data compression algorithms. The descriptions are brief and many details are neglected, so these descriptions should be treated as a simple illustration of main ideas of the algorithms. Key descriptions of data compression techniques are as follows:
- Huffman coding. Huffman coding is a common example of a variable-length code. Whereas most (uncompressed) schemes encode characters using a fixed number of bits, often 8 or 16, the idea of Huffman coding is to use variable-length bit codes to represent symbols. The algorithm uses the shortest codes for the most frequently-used characters and often much longer codes for the rare symbols. This conversion from fixed-length codes to variable-length can reduce the length of the whole input string significantly.
- Burrows-Wheeler Transform (BWT). Variable-length coding is limited by the statistics of the data. The more uneven the frequencies of the symbols, the better the compression. It is possible to change the statistics of data by performing some kind of transform to make the data further compressible. The Burrows-Wheeler transform and other block-sorting transforms are designed to change the statistics or statistical homogeneity of the input data with this goal in mind. BWT rearranges the symbols of the input data in order to obtain an output data block containing long series of equal symbols, creating likely statistical inhomogeneity in the block.
- Move-to-Front (MTF) Transform. The goal of the MTF is to turn a series of equal symbols into a series of zeros. To achieve this, the MTF transform changes the statistics of the input string to contain series of equal symbols. This makes the string more compressible. The principal idea of the MTF transform is very simple. On every step the transform writes to the output stream a value that's equal to the index of the current symbol of string in the alphabet, then reorders the alphabet so that that symbol is first. It does this by shifting the symbols between the first symbol and the current symbol's original position.
- Run-Length Encoding (RLE). RLE reduces the length of input data vectors that contain runs of zeros, or more generally runs of any number. RLE replaces a series of values with a pair of values: the value of the members of the series and the number of values in the series minus one. When encoding a run of five zeros, for example, the pair of values would be [0, 4].
- Lempel-Ziv-77 (LZ77). The LZ77 algorithm belongs to the family of dictionary-based algorithms. Such algorithms assume the existence of the dictionary filled with indexed strings. They achieve compression by replacing input substrings with the index of a matching string in the dictionary. The LZ77 algorithm uses a sliding dictionary that is a window of perhaps 32 KB into the input stream itself. The encoding algorithm searches the window for substrings also in the look-ahead buffer, which is much smaller, then advances the window and look-ahead buffer across the input stream. If it finds a match, the algorithm encodes the second occurrence of the string using the offset from the beginning of the first occurrence of the string and the length of the string that matches.
- Interval Transform. The Interval transform changes the statistics of input data by replacing the input string with sequences of intervals between matching symbols. The encoder produces a sequence of such intervals for each symbol in the alphabet. In producing these sequences, the algorithm iterates through the alphabet, marking the distances between each pair of symbols and then removing every occurrence of that symbol.