# JPEG-Like Image Compression, Part 1

## Here's a C++ class library for JPEG-like compression

### Craig A. Lindley

*Craig is a founder of Enhanced Data Technology and author of Practical Image Processing in C, Practical Ray Tracing in C, and Photographic Imaging Techniques for Windows (all published by John Wiley & Sons). Craig can be contacted at edt@rmii.com. EDT also maintains a home page on the WWW at www.mirical.com.*

JPEG, the Joint Photographers Expert Group, is an ISO/CCITT-backed international standards committee that has defined an image-compression specification (also referred to as "JPEG") for still images. The standard stipulates the following:

- Compression algorithms must allow for software-only implementations on a wide variety of computer systems.
- Algorithms must operate at or near the state of the art in image-compression rates.
- Compressed images must be good to excellent in quality.
- Compression ratios need to be user variable.
- Compression must be generally applicable to all sorts of continuous-tone (photographic type) images.
- Interoperability must exist between implementations from different vendors.

It is important to distinguish between JPEG's image-processing algorithms, which comprise JPEG image compression, and the file format in which JPEG images are typically stored. In this two-part article, I'll present an image-compression technique called "CAL" (my initials) that uses the same algorithms used for JPEG images, but encapsulates the images in a simple, proprietary file format. In this installment, I'll focus on JPEG and its constituent algorithms, including discrete cosine transforms, quantization, and entropy encoding. I'll also discuss how and why gray-scale and color images must be treated differently and why compression of color images is much more difficult. Next month, I'll discuss how CAL differs from JPEG, present the C++ classes on which CAL is built, and suggest possible uses for CAL.

### JPEG Backgrounder

The JPEG specification defines four different modes of operation for JPEG-aware software:

- A sequential encoding mode, in which each image component is encoded in a left-to-right, top-to-bottom fashion.
- A progressive encoding method, in which images are compressed such that on decoding they paint an image that is successively refined for each decoded scan. This is similar to interleaved GIF images.
- A lossless encoding method, where a decoded image is guaranteed to be a bit-for-bit copy of the original image. Lossless encoding cannot achieve the compression ratios of normal, lossy JPEG compression, so it is rarely implemented.
- A hierarchical encoding method, where multiple copies of an image are encoded together with each copy being a different resolution. This allows an application to decode an image of the specific resolution it needs without having to handle an image with too-high or too-low resolution.

JPEG achieves image compression by methodically throwing away visually insignificant image information. This information includes the high-frequency components of an image, which are less important to image content than the low-frequency components. When an image is compressed using JPEG, the discarded high-frequency components cannot be retrieved, so baseline, sequential JPEG is considered lossy. When a JPEG compressed image is displayed, much of the high-frequency information is missing: The image is not a bit-for-bit copy of the original. If lossless image compression is necessary, either the lossless JPEG mode of operation or a different compression technique must be used.

Allowing the user and/or application program to make the image quality versus compression trade-off on an image-by-image basis is an important aspect of JPEG operation.

### The JPEG Pipeline

Figure 1 illustrates the steps necessary for JPEG encoding/decoding of a gray-scale image.

During encoding, an image is broken up into 8x8-pixel blocks that are processed individually, from left-to-right and top-to-bottom. Each block of image data is subjected to a forward discrete cosine transform (FDCT), which converts pixel values into their corresponding frequency components. A block of pixels is submitted to the FDCT, which returns a block of frequency components. The frequency-component values, or coefficients, are resequenced ("zigzagged") in order of increasing frequency.

The resultant frequency coefficients are then quantized, which causes components with near-zero amplitudes to become zero. The level of frequency-component quantization depends upon a user-specified image-quality factor. The more an image is compressed, the more frequency components in a block become zero.

The entropy encoding process performs two types of image compression on the block of frequency coefficients. First, it run-length encodes the number of zero-coefficient values preceding a nonzero coefficient in a block. Secondly, it bit encodes the coefficient values using statistically generated tables. The encoded bit stream is written to the output stream or file.

During decoding, the encoded bit stream is read from the input stream or file, and the quantized frequency coefficients for a block of pixel data are decoded. These frequency components are then dequantized by multiplying them by the quantization table with which they were produced.

The frequency coefficients are resequenced from frequency order back into pixel order. The block of frequency components is then subjected to an inverse discrete cosine transform (IDCT), which converts the frequency component values back into pixel values. The reconstructed pixel values are stored in the decoded image.

### The Discrete Cosine Transform

Discrete cosine transforms (DCTs) convert 2-D pixel-amplitude and spatial information into frequency content for subsequent manipulation. In other words, a DCT is used to calculate the frequency components of a given signal (in this case, an 8x8-block of pixels) sampled at a given sampling rate. The mathematics of DCTs are well beyond the scope of this article--simply note that DCTs are calculated by applying a series of weighting coefficients to the pixel data.

The equations in Figure 2 describe the 2-D DCT process for 8x8-pixel blocks. Note, however, that the transcendental functions cannot be computed with perfect accuracy. In theory, if the FDCT and IDCT could be computed with perfect accuracy, the exact pixel data fed into the FDCT could be recovered when the transformed data was returned from the IDCT. In real life, this doesn't happen. What you get out is not exactly what you put in. Although the JPEG specification does not specify the exact algorithms to use for the DCTs, it does mandate an accuracy figure for the results. This gives implementors a lot of latitude but still allows for compliance testing. Numerical errors, as you shall see, are not the source of the loss in JPEG compressed images.

The FDCT is applied to the 8x8 block of pixel data, resulting in a series of 64 frequency coefficients. The first, the "DC coefficient," is the average of all of the pixel values within the block. Coefficients 1-63, the "AC coefficients," contain the spectrum of frequency components that make up the block of image data. As a result of the slowly changing nature of continuous-tone images, many high-frequency coefficients produced by the FDCT are zero or close to zero in value. This is exploited during the quantization process. Note that there is usually a strong correlation between the DC coefficients of adjacent blocks of image data. The entropy-encoding process takes advantage of this as we shall discuss later.

In developing the CAL code, I made multiple attempts at coding the DCTs. The first attempt was a literal interpretation of the equations in Figure 2, utilizing floating-point numbers for the calculations. Although this method worked, life was too short to compress images that way. Subsequent attempts utilized fixed-point 32-bit integers instead of floating-point numbers, and I precomputed and scaled the cosine weighting terms before their use. This approach improved performance by a factor of 12; see dct.hpp (Listing One) and dct.cpp (Listing Two) for details. Further experimentation pointed to the speed of the DCTs as the determining factor in overall image-compression performance. To improve CAL's performance, I replaced my DCT code with highly optimized DCT code extracted from the Independent JPEG Group's (IJG) JPEG software. The new DCT code is contained in the files dct1 .hpp and dct1.cpp, which are available electronically; see "Availability," page 3. The optimized DCT code was a full three times faster than the fastest code I had written.

### The Zigzag Sequence

Once the block of image pixels is processed with the FDCT, the result is a frequency spectrum for the pixels. The coefficients that make up the frequency spectrum are not conveniently arranged in ascending order at the conclusion of FDCT processing. Instead, they are clustered with the DC coefficient in the upper left, surrounded by the lower-frequency components. The higher-frequency components are grouped towards the lower right. This is shown in Figure 3. The application of the zigzag sequence after the FDCT converts the 64 frequency coefficients into ascending order, as required for further processing. With the frequency coefficients in ascending order, the DC and low-frequency coefficients (which are less likely to be zero) are grouped together, followed by the high-frequency coefficients.

### Quantization

The quantization step is where most image compression takes place and is therefore the principal source of loss in JPEG image compression. Quantization is performed by dividing each frequency coefficient by a corresponding quantization coefficient mandated by the JPEG specification. One quantization table is specified for the luminance (brightness) portion of the image data, and another, for the chrominance (color). Quantization coefficients with values approaching one allow the corresponding frequency coefficient to pass through the quantization process unmodified. Large quantization coefficients force the corresponding frequency coefficients to approach zero in value. Thus, visually insignificant, high-frequency information is discarded.

In the CAL code presented here, you can specify a quality factor value in the range 10-100, where a value of 10 results in severe image compression with noticeable image degradation, and a value of 100 results in much lower compression but with generally unnoticeable image distortion. The quality factor that you specify is used to manipulate the quantization tables in the JPEG specification. A quality factor of 100 causes all of the quantization coefficients to become 1, resulting in no quantization (any number divided by 1 is still that number). A quality factor of 50 causes the quantization tables in the specification to be used unaltered. Quality factors approaching 10 result in large quantization coefficients, which causes many of the frequency coefficients to be quantized to 0. As more and more of the frequency coefficients become 0, the more an image can be compressed.

During quantization, the image data is divided by the values in the quantization table, but during JPEG decoding, image data must be dequantized. Dequantization is performed by multiplying the decoded image data by the value in the quantization table, thereby restoring it to a value close to the prequantization value.

### Entropy Encoding

The final step in the JPEG encoding process is entropy encoding. The JPEG specification allows for either arithmetic or Huffman encoding. It is generally acknowledged that arithmetic encoding performs marginally better for some images, but is much more complex to implement. Also, there are currently patent problems with using arithmetic encoding, so most implementors steer clear of it. Huffman encoding is in the public domain and can be used without worry of patent infringement. Consequently, Huffman encoding is utilized in most JPEG implementations and in the CAL code.

Using Huffman encoding as the entropy-coding mechanism provides additional lossless compression for the already highly processed image data. Huffman compression is based upon the statistical characteristics of the data to be compressed: Symbols that occur frequently in the data are assigned shorter Huffman codes; those that occur infrequently are assigned longer codes. Compression will occur as long as there is a large difference between the occurrence counts of the most common and the least common symbols. Note also that Huffman coding is bit oriented, not byte oriented. The Huffman codes assigned to the various symbols are bit packed together into the tightest possible configuration of bytes. This makes the code for Huffman encoding/decoding difficult to write and debug because the data stream has to be examined at the bit level. Convenient byte boundaries do not exist at the lowest level.

During the encoding process, the JPEG frequency coefficients for a block of image data are bit encoded as a variable-length Huffman code, followed by a variable-length integer. During decoding, the variable-length codes and accompanying variable-length integers are converted back into integer values for subsequent processing.

Two additional forms of data compression occur in the entropy-coding step: delta coding of the DC coefficients of adjacent blocks of image data; and run-length encoding of zero-valued frequency coefficients. These ancillary compression mechanisms contribute to the overall compression achieved for JPEG-compressed images.

As previously mentioned, there is usually a high level of correlation between the DC coefficients of adjacent blocks of image data, and the values of the DC coefficients are generally large (requiring many bits to Huffman encode). Therefore, significant compression can be achieved by encoding the differences in the DC coefficient values between adjacent blocks instead of their actual values. In most cases, the difference in DC coefficient values can be encoded in fewer bits than can the actual value. Of course, during decoding of the Huffman bit stream, the encoded difference values must be converted back to actual values for the DC coefficients.

Run-length encoding of the zero coefficient values also provides significant compression. As mentioned, quantization causes many high-frequency coefficients in a block of image data to become 0. Picture the frequency coefficients as an array of 64 integers with nonzero values for the first five or ten entries, followed by mostly zero values. Significant compression can be achieved by counting up the number of zero coefficients in the block preceding a nonzero coefficient and encoding that number along with the value of the nonzero coefficient. This is so important that two special Huffman codes have been assigned to assist this process: code 0xF0 (referred to as ZRL), used to signal the special case where 16 zero coefficients in a row were detected; and code 0x00 (called EOB or end of block), which signifies that all of the remaining coefficients in a block are 0, so no further processing of the block is necessary.

For maximum compression, the Huffman codes used to compress an image should be derived from the image itself. The JPEG specification, however, provides a general set of Huffman code tables that can be used for encoding but that are not optimal for any one image. The tables provided were generated by statistical sampling of a large number of continuous-tone images considered suitable for JPEG compression. With CAL, I use the provided Huffman tables instead of generating unique tables from the image data on the fly. This approach was chosen for three reasons: first, performance. It takes time to analyze and build a Huffman table for an image. Building the tables on the fly would slow things down quite a bit. Second, for decoding reasons, the unique Huffman codes would have to be stored along with the encoded image data in the CAL file, making the compressed image files larger. Finally, this approach would increase the complexity of the already complex Huffman code.

The file tables.cpp (available electronically) shows JPEG-specified Huffman code tables. The Huffman tables have been organized in many different ways to optimize the encoding/decoding processes. The code that performs Huffman encoding/decoding is contained in huffman.cpp.

### Multiple-Component Images

To this point, I have discussed the compression of single-component images only; that is, gray-scale images in which each pixel represents the luminance of the image sample at a specified point. The 8x8 blocks into which a gray-scale image is chopped for processing are filled with 8-bit samples taken from the image. As expected, the image would be processed from left-to-right and top-to-bottom, although the application must define the image top. CAL works with Windows DIB images, which are stored upside down in memory. CAL therefore defines the top of the image to be the image data at the lowest memory address, opposite to how the data is actually stored.

Compressing color images is more complex. Images with multiple color components must be handled in addition to single-component images. The JPEG specification allows processing of images with up to four color components. Much of the complexity involved in decoding standard JPEG images is a result of the extreme flexibility allowed for processing of multiple-component images. To understand the inherent complexities, the following concepts must be addressed:

**Color-space conversions.** Color images come in many different formats depending upon how they were acquired and how they are meant to be used. The most common color-space coding for color images is probably RGB, but many others exist, including CMYK and YCbCr.

RGB-format image data is not optimum for lossy image compression because an RGB pixel's brightness and hue information is distributed among all three color components--red, green, and blue--forcing all three components to be treated equally. Any unequal treatment results in distortion in image color and/or brightness. The human visual system is less sensitive to changes in color than to changes in brightness, so it makes sense to convert RGB image data into a color space that treats luminance and chrominance separately. This conversion takes advantage of the sensitivities of the human visual system. In fact, all major video-broadcast standards (including NTSC, PAL, and SECAM) have exploited this fact for years. Each standard transmits the luminance information of an image at full bandwidth and the chrominance information at a reduced bandwidth.

For these reasons, RGB images are generally converted into the YCbCr color space as a prerequisite to compression. Thus, the luminance component, Y, can be treated differently than the two chrominance components, Cb and Cr, without undo image degradation. Figure 4 provides formulas for color-space conversions. I'll discuss how these formulas are applied in Part 2 of this article.

**Component subsampling.** Once the RGB image data is converted into YCbCr format, it could be compressed directly. In that case, one 8x8 block each of Y, Cb, and Cr data extracted from the image would be processed sequentially through the JPEG pipeline. While easy, this technique misses an opportunity for further image compression through subsampling the chrominance portion of the image data. Various chroma subsampling techniques could be applied to the YCbCr image data, including 4:2:2 and 4:1:1. 4:2:2 specifies that for every four samples of Y information, there are two samples each of Cb and Cr. 4:1:1 implies that for every four samples of Y, there is one sample each of Cb and Cr. To give you an idea of how much compression can be achieved by subsampling, consider that each pixel of RGB data (or converted YCbCr data) would require 24 bits for storage. The same pixel value using 4:2:2 subsampling would require 16 bits. 4:1:1 subsampling of the same pixel would require 12 bits for storage.

To further complicate matters, chrominance subsampling can be either 1- or 2-D; that is, it can be applied across an image or both across and down an image. Subsampling in 2-D reduces the amount of image data even further. The choice of subsampling technique depends upon the intended uses of the compressed images. CAL uses 2-D 4:2:2 chroma subsampling, which seems to be an acceptable trade-off between image quality and compressed image size for the photographic-quality images I tend to deal with.

### Minimum Coded Units

Each 8x8 block of color-component information is referred to as a "data block" or "data unit." To define a region of an image, 2-D 4:2:2 subsampling requires four blocks of Y image data for each block of Cb and Cr data. These six blocks are referred to as a minimum coded unit (MCU). Without subsampling, an MCU would consist of one block of Y, one block of Cb, and one block of Cr data. For gray-scale images, each block of Y data would be considered an MCU.

### Next Month

Up to this point, I've discussed the various concepts and algorithms utilized in JPEG compression. Next month, I'll focus on the CAL implementation of JPEG technology. In doing so, I'll describe the design and operation of the C++ classes and discuss the practical considerations in implementing DCTs and color-space conversions using only integer arithmetic. Additionally, I will provide a series of images that show the effects of various levels of CAL compression and give some figures on CAL performance.

### References

ISO JPEG Standards (DIS 10918-1 and draft DIS 10918-2), ANSI Sales (212-642-4900).

JFIF File Format Specification, Literature Department, C-Cube Microsystems Inc., 399A West Trimble Road, San Jose, CA 95131 (408-944-6300).

Loeffler, C., A. Ligtenberg, and G. Moschytz. "Practical Fast 1-D DCT Algorithms with 11 Multiplications." *Proceeding of the International Conference on Acoustics, Speech, and Signal Processing 1989* (ICASSP '89).

Mattison, Phillip E. *Practical Digital Video with Programming Examples in C*. New York, NY: John Wiley & Sons, 1994.

Nelson, Mark. *The Data Compression Book*. New York, NY: M&T Books, 1991.

Pennebaker, William B. and Joan L. Mitchell. *JPEG Still Image Data Compression Standard*. New York, NY: Van Nostrand Reinhold, 1993.

TIFF 6.0 File Format Specification, Aldus Corp. (206-628-6593) and via ftp at sgi.com (192.48.153.1). See the file graphics/tiff/TIFF6.ps.Z.

Wallace, Gregory. "The JPEG Still Picture Compression Standard." *Communications of the ACM* (April 1991).

**Figure 1: **The JPEG encoding/decoding pipelines.

**Figure 2: **(a) Forward discrete cosine transform (FDCT); (b) inverse discrete cosine transform (IDCT).

**Figure 3: **The zigzag sequence.

**Figure 4:** (a) RGB-to-YCbCr conversion; (b) YCbCr-to-RGB conversion.

(a) Y= 0.29900*R+0.58700*G+0.11400*B Cb=-0.16874*R-0.33126*G+0.50000*B Cr= 0.50000*R-0.41869*G-0.08131*B (b) R=Y+1.40200*Cr G=Y-0.34414*Cb-0.71414*Cr B=Y+1.77200*Cb

#### Listing One

// Discrete Cosine Transform Class Interface File #ifndef DCT_HPP #define DCT_HPP #include "misc.hpp" #define FORWARDREORDER 1 // Direction of zigzag #define INVERSEREORDER 0 // 1 is pixel order to freq order // 0 is freq order to pixel order // The DCT Class Definition class DCT { private: long Coefficients[MAXRESULTS][MAXSAMPLES]; public: DCT(void); void FDCT(BYTEBLOCKPTR InBlock, INTBLOCKPTR OutBlock); void IDCT(INTBLOCKPTR InBlock, BYTEBLOCKPTR OutBlock); void ZigZagReorder(int *InBlock, int *OutBlock, BOOL ForwardReorder); }; #endif

#### Listing Two

// Discrete Cosine Class Member Functions #include <math.h> #include <mem.h> #include "dct.hpp" #include "tables.hpp" #define SCALE4BITS 4 #define SCALE6BITS 6 #define DIVIDEBYFOUR 2 #define SCALEBYBITS 10 // Scale by this number of bits #define SCALEFACTOR (1 << SCALEBYBITS) #define SCALEBITSANDDIVIDE (SCALEBYBITS + DIVIDEBYFOUR + SCALE6BITS) // Class Constructor for DCT class. Creates an array of weighting // coefficients used for the discrete cosine transforms. Entries in // Coefficients array are scaled up by SCALEFACTOR to remove the need // for floating point arithematic. Coefficients array has dimensions of // [MAXRESULTS][MAXSAMPLES]. DCT::DCT(void) { // Create the coefficient array double const PI = 4 * atan(1.0); double Constant = PI/16.0; for (register int k = 0; k < MAXRESULTS; k++) { for (register int m = 0; m < MAXSAMPLES; m++) { if (k == 0) Coefficients[k][m]=((1.0 / sqrt(2.0))*(double) SCALEFACTOR)+0.5; else Coefficients[k][m] = (cos(Constant * k * (2.0 * m + 1)) * (double) SCALEFACTOR) + 0.5; } } } // Forward 2D discrete cosine transform. Transforms block of pixels in // InBlock into a block of frequency components in OutBlock. void DCT::FDCT(BYTEBLOCKPTR InBlock, INTBLOCKPTR OutBlock) { long TempBlock[8][8]; memset(TempBlock, 0, sizeof(TempBlock)); long *plDest; // Do a one dimensional row FDCT for (register int Row = 0; Row < 8; Row++) { for (register int K = 0; K < 8; K++) { plDest = &TempBlock[Row][K]; for (register int Col = 0; Col < 8; Col++) *plDest += (*InBlock)[Row][Col] * Coefficients[K][Col]; *plDest >>= SCALE4BITS; // Limited scaling to keep as much } // precision as possible } long TempLong; // Do a one dimensional column FDCT for (register int Col = 0; Col < 8; Col++) { for (register int K = 0; K < 8; K++) { TempLong = 0; for (register int Row = 0; Row < 8; Row++) { TempLong += TempBlock[Row][Col] * Coefficients[K][Row]; } (*OutBlock)[K][Col] = (int)(TempLong >> SCALEBITSANDDIVIDE); } } } // Inverse 2D discrete cosine transform void DCT::IDCT(INTBLOCKPTR InBlock, BYTEBLOCKPTR OutBlock) { long TempBlock[8][8]; memset(TempBlock, 0, sizeof(TempBlock)); long *plDest; // Do a one dimensional column FDCT for (register int Col = 0; Col < 8; Col++) { for (register int K = 0; K < 8; K++) { plDest = (long *) &TempBlock[K][Col]; for (register int Row = 0; Row < 8; Row++) *plDest += (*InBlock)[Row][Col] * Coefficients[Row][K]; *plDest >>= SCALE4BITS; // Limited scaling to keep as } // much precision as possible } long TempLong; // Do a one dimensional row FDCT for (register int Row = 0; Row < 8; Row++) { for (register int K = 0; K < 8; K++) { TempLong = 0; for (register int Col = 0; Col < 8; Col++) TempLong += TempBlock[Row][Col] * Coefficients[Col][K]; TempLong >>= SCALEBITSANDDIVIDE; // Clamp pixel value into valid range 0..255 TempLong = (TempLong > 255) ? 255:TempLong; TempLong = (TempLong < 0) ? 0:TempLong; // Store pixel value into output block (*OutBlock)[Row][K] = (BYTE) TempLong; } } } // Reorder the coefficients between pixel format and frequency content // format. ZigZag table in file tables.cpp. void DCT::ZigZagReorder(int *InBlock, int *OutBlock, BOOL ForwardReorder) { if (ForwardReorder) // Convert from pixels to frequencies for (register int Index = 0; Index < 64; Index++) OutBlock[Index] = InBlock[ZigZagTable[Index]]; else // Convert from frequencies to pixels for (register int Index = 0; Index < 64; Index++) OutBlock[ZigZagTable[Index]] = InBlock[Index]; }

Copyright © 1995, *Dr. Dobb's Journal*