Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

H.264 and Video Compression


Intra Prediction

Intra frames by their nature don't depend on earlier or later frames for reconstruction. However, in H.264 the encoder can use earlier blocks from within the same frame as reference for new blocks. This process, intra prediction, can give additional compression for intra macroblocks, and can be particularly effective if a sufficiently appropriate reference block can be found.

The reference blocks are not used in the way that inter prediction blocks are, by taking the pixel-by-pixel difference of actual blocks in adjacent frames. Instead, a prediction of the current block is calculated as an average of some of the pixels bordering it. Which pixels are chosen and how they are used to calculate the block is dependent on the intra prediction mode. Figure 4 shows the directions that pixels may be used, along with the mode numbers as defined in the H.264 specification.

Figure 4: Mode Numbers for Intra Prediction in H.264

This can also be one of the most computationally intensive parts of the encoding process. For the encoder to exhaustively search through all options, it would have to compare each 16x16 luma or 8x8 chroma block against four other blocks, and each 4x4 or 8x8 luma block against 9 other blocks.

Because the encoder can consider a variety of block sizes, a scheme that optimizes the trade-off between the number of bits necessary to represent the video and the fidelity of the result is desirable.

Transformation

Instead of the DCT, the H.264 algorithm uses an integer transform as its primary transform to translate the difference data between the spatial and frequency domains. The transform is an approximation of the DCT that is both lossless and computationally simpler. The core transform, illustrated in Figure 5, can be implemented using only shifting and adding.

Figure 5: Matrices for Transformation in H.264

This 4x4 transform is only one flavor of the H.264 transform. H.264 defines transformations on 2x2 and 4x4 blocks in the baseline profile, and additional profiles support transforms on larger block sizes, rectangular or square, with dimensions that are also powers of two.

The algorithm applies the transforms separately on the first, or DC chroma and luma component. In the baseline profile, H.264 uses one 2x2 transform chroma DC coefficients, a 4x4 transform luma DC coefficients, and the main 4x4 transform for all other coefficients.

Quantization

The quantization stage reduces the amount of information by dividing each coefficient by a particular number to reduce the quantity of possible values that value could have. Because this makes the values fall into a narrower range, this allows entropy coding to express the values more compactly.

Quantization in H.264 is arithmetically expressed as a two-stage operation. The first stage is multiplying each coefficient in the 4x4 block by a fixed coefficient-specific value. This stage allows the coefficients to be scaled unequally according to importance or information. The second stage is dividing by an adjustable quantization parameter (QP) value. This stage provides a single "knob" for adjusting the quality and resultant bitrate of the encoding. The two operations can be combined into a single multiplication and single shift operation.

The QP is expressed as an integer from 0 to 51. This integer is converted to a quantization step size (QStep) nonlinearly. Each six steps increases the step size by a factor of 2, and between each pair of power-of-two step sizes N and 2N there are 5 five steps: 1.125N, 1.25N, 1.375N, 1.625N, 1.75N.

Reordering

When encoding the coefficients of each macroblock using entropy coding, the codec processes the blocks in a particular order. The order helps increase the number of consecutive zeros.

It's natural to handle this ordering when writing the output of the transform and quantization stage.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.