H.264 and Video Compression

By Stewart Taylor, August 31, 2007

Producing video compression of acceptable quality and very low bit-rate

Stewart Taylor is a software architect at Intel Corporation and was the lead designer of the Intel IPP functions. He is also author of Optimizing Applications for Multi-Core Processors, from which this article is adapted. Copyright (c) 2007 Intel Corporation. All rights reserved.

The two series of video codec nomenclature H.26x and MPEG-x overlap. MPEG-2 is named H.262 in the H.26x scheme. Likewise, another popular codec, H.264, is a subset of MPEG-4 also known as MPEG-4 Advanced Video Coding (AVC). Its intent, like that of all of MPEG-4, was to produce video compression of acceptable quality and very low bit-rate -- around half of its predecessors MPEG-2 and H.263.

Like its predecessors in the H.26x video codec family, H.264 has two encoding modes for individual video frames -- intra and inter. In the former, a frame of video is encoded as a stand-alone image without reference to other images in the sequence. In the latter, the previous and possibly future frames are used to predict the values. Figure 1 shows the high-level blocks involved in intra-frame encoding and decoding of H.264. Figure 2 shows the encoding and decoding process for inter frames.

Figure 1: Intra-Mode Encoding and Decoding in H.264

Whether in inter or intra frames, blocks in H.264 can be expressed relative to previous and subsequent blocks or frames. In inter frames, this is called "motion estimation" and is relative to blocks in other frames. This is the source of considerable compression. As with other video compression techniques, this exploits the fact that there is considerably less entropy in the difference between similar blocks than in the absolute values of the blocks. This is particularly true if the difference can be between a block and a constructed block at an offset from that block in another frame.

Figure 2: Inter-Mode Encoding and Decoding in H.264

H.264 has very flexible support for motion estimation. The estimation can choose from 32 other frames as reference images, and is allowed to refer to blocks that have to be constructed by interpolation.

The encoder is responsible for determining a reference image, block and motion vector. This block is generally chosen using some search among the possibilities, starting with the most likely options. The encoder then calculates and encodes the difference between previously encoded blocks and the new data.

On the decoding end, after decoding the reference blocks, the code adds the reference data and the decoded difference data. The blocks and frames are likely to be decoded in non-temporal order, since the frames can be encoded relative to forward-looking blocks and frames.

H.264 encoding supports sub-pixel resolution for motion vectors, meaning that the reference block is actually calculated by interpolating inside a block of real pixels. The motion vectors for luma blocks are expressed at quarter-pixel resolution, and for chroma blocks the accuracy can be eighth-pixel accuracy.

This sub-pixel resolution increases the algorithmic and computational complexity significantly. The decoding portion, which requires performing sub-pixel motion compensation only once per block, takes about 10 to 20 percent of decoding pipeline. The bulk of this time is spent interpolating values between pixels to generate the sub-pixel-offset reference blocks. The cost of performing sub-pixel estimation varies with the encoding algorithm, but may require performing motion compensation more than once.

The interpolation algorithm to generate offset reference blocks is defined differently for luma and chroma blocks. For luma, interpolation is performed in two steps, half-pixel and then quarter-pixel interpolation. The half-pixel values are created by filtering with this kernel horizontally and vertically:

[1 -5 20 20 -5 1]/32

Quarter-pixel interpolation is then performed by linearly averaging adjacent half-pixel values.

Motion compensation for chroma blocks uses bilinear interpolation with quarter-pixel or eighth-pixel accuracy, depending on the chroma format. Each sub-pixel position is a linear combination of the neighboring pixels.

Figure 3 illustrates which pixels are thus used for both interpolation approaches.

Figure 3: Sub-pixel Interpolation for Motion Compensation in H.264

After interpolating to generate the reference block, the algorithm adds that reference block to the decoded difference information to get the reconstructed block. The encoder executes this step to get reconstructed reference frames, and the decoder executes this step to get the output frames.

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

H.264 and Video Compression

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

H.264 and Video Compression

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content