MPEG-2 is intended for high-quality, high-bandwidth video. It is most prominent because it is used for DVD and HDTV video compression. Computationally, good encoding is expensive but can be done in real time by current processors. Decoding an MPEG-2 stream is relatively easy and can be done by almost any current processor or, obviously, by commercial DVD players.
MPEG-2 players must also be able to play MPEG-1. MPEG-1 is very similar, though the bit stream differs and the motion compensation has less resolution. It is used as the video compression on VCDs.
MPEG-2 is a complicated format with many options. It includes seven profiles dictating aspect ratios and feature sets, four levels specifying resolution, bit rate, and frame rate, and three frame types. The bit stream code is complex and requires several tables. However, at its core are computationally complex but conceptually clear compression and decompression elements. These elements are the focus of this section.
MPEG-2 components are very similar to those in JPEG. MPEG-2 is DCT based, and uses Huffman coding on the quantized DCT coefficients. However, the bit stream format is completely different, as are all the tables. Unlike JPEG, MPEG-2 also has a restricted -- though very large -- set of frame rates and sizes. But the biggest difference is the exploitation of redundancy between frames.
There are three types of frames in MPEG:
- I (intra) frames.
- P (predicted) frames.
- B (bidirectional) frames.
There are several consequences of frame type, but the defining characteristic is how prediction is done. Intra frames do not refer to other frames, making them suitable as key frames. They are, essentially, self-contained compressed images. By contrast, P frames are predicted by using the previous P or I frame, and B frames are predicted using the previous and next P or I frame. Individual blocks in these frames may be intra or non-intra, however.
MPEG is organized around a hierarchy of blocks, macroblocks, slices, and frames. Blocks are 8 pixels high by 8 pixels wide (8x8) in a single channel. Macroblocks are a collection of blocks 16 pixels high by 16 pixels wide (16x16) and contain all three channels. Depending on subsampling, a macroblock contains 6, 8, or 12 blocks. For example, a YCbCr 4:2:0 macroblock has four Y blocks, one Cb, and one Cr.
Following are the main blocks of an MPEG-2 codec, in encoding order. Figure 1 shows how these blocks relate to one another.
Motion Estimation and Compensation.
The key to the effectiveness of video coding is using earlier and sometimes later frames to predict a value for each pixel. Image compression can only use a block elsewhere in the image as a base value for each pixel, but video compression can aspire to use an image of the same object. Instead of compressing pixels, which have high entropy, the video compression can compress the differences between similar pixels, which have much lower entropy.
Objects and even backgrounds in video are not reliably stationary, however. To make these references to other video frames truly effective, the codec needs to account for motion between the frames. This is accomplished with motion estimation and compensation. Along with the video data, each block also has motion vectors that indicate how much that frame has moved relative to a reference image. Before taking the difference between current and reference frame, the codec shifts the reference frame by that amount. Calculating the motion vectors is called "motion estimation" and accommodating this motion is called "motion compensation."
This motion compensation is an essential and computationally expensive component in video compression. In fact, the biggest difference between MPEG-1 and MPEG-2 is the change from full-pel to half-pel accuracy. This modification makes a significant difference in quality at a given data rate, but also makes MPEG-2 encode very time-consuming.