Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

H.264 and Video Compression


Transformation and Quantization

In Intel IPP functions, transform and quantization functionality are merged for more efficiency. There are four functions for the decoding of H.264:

  • ippiTransformDequantLumaDC_H264_16s_C1I
  • ippiTransformDequantChromaDC_H264_16s_C1I
  • ippiDequantTransformResidual_H264_16s_C1I
  • ippiDequantTransformResidualAndAdd_H264_16s_C1I

There are analogous functions for encoding:

  • ippiTransformQuantLumaDC_H264_16s_C1I
  • ippiTransformQuantChromaDC_H264_16s_C1I
  • ippiTransformQuantResidual_H264_16s_C1I

Additional functions handle 8x8 blocks.

Listing 3 lists a block of code from the H.264 that uses these functions.

The cbp4x4 variable is a bitmask indicating whether there are any DC coefficients within the macroblock that have any data, and individually whether each residual (AC) block within the macroblock has any data. The QP variable contains the Quality Parameter that specifies the degree of quantization.

If the bitmask indicates that there is any DC luma data, the code transforms it with the ippiTransformDequantLumaDC function. Then the code iterates over the 16 blocks within the macroblock. For each block, if there is either DC data or residual data, the code will transform and dequantize the block. It will pass in the decoded DC coefficient, which might be 0, the buffer of residual data along with a flag indicating whether the residual data is valid, and the Quality Parameter.

if ((cbp4x4 & (IPPVC_CBP_LUMA_AC | IPPVC_CBP_LUMA_DC)) != 0)
{
  Ipp16s *pDC;
  Ipp16s DCCoeff;

  Ipp16s *tmpbuf;

  /* bit var to isolate cbp for block being decoded */
  Ipp32u uCBPMask = (1 << IPPVC_CBP_1ST_LUMA_AC_BITPOS);

  if ((cbp4x4 & IPPVC_CBP_LUMA_DC) != 0)
  {
    luma_dc = (*ppSrcCoeff);
    *ppSrcCoeff += 16;
    ippiTransformDequantLumaDC_H264_16s_C1I(luma_dc, QP);
  }

  tmpbuf = 0;  /* init as no ac coeffs */
  pDC = 0;  /* init as no dc */

  ac_coeffs = pDstCoeff;

  for (Ipp32s uBlock = 0; uBlock < 16;
       uBlock++, uCBPMask <<= 1)
  {
    DCCoeff = (Ipp16s)luma_dc[block_subblock_mapping[uBlock]];
    if (DCCoeff != 0)
      pDC = &DCCoeff; /* dc coeff presents */

    if ((cbp4x4 & uCBPMask) != 0)
    {
      memcpy(pDstCoeff, *ppSrcCoeff, 16*sizeof(Ipp16s));
      tmpbuf = pDstCoeff;
      pDstCoeff += 16;
      *ppSrcCoeff += 16;
    }

    Ipp32s hasAC = tmpbuf != 0;
    if (tmpbuf || pDC)
    {
      if (!pDC)
      {
        if (tmpbuf)
        {
          if (dc_present)
            tmpbuf[0] = 0;
        }
      }
      else
      {
        if (!tmpbuf)
        {
          tmpbuf = pDstCoeff;
          pDstCoeff += 16;
          cbp4x4 |= uCBPMask;
        }
      }
      ippiDequantTransformResidual_H264_16s_C1I(tmpbuf, 8, pDC,
        hasAC, QP);
      tmpbuf = 0;
      pDC = 0;
    }
  }
}
Listing 3: Transformation and Quantization in H.264

Deblocking Filter

The Intel IPP functions that perform filtering on the edges of macroblocks are divided according to horizontal and vertical edges, luma and chroma blocks, block size, bit depth, and sampling rate. They are the following:

  • ippiFilterDeblockingLuma_VerEdge_H264_[8u|16u]_C1IR
  • ippiFilterDeblockingLuma_HorEdge_H264_[8u|16u]_C1IR
  • ippiFilterDeblockingChroma_HorEdge[422|444]_H264_[8u|16u]_C1IR
  • ippiFilterDeblockingChroma_VerEdge[422|444]_H264_[8u|16u]_C1IR
  • ippiFilterDeblockingLuma_VerEdge_MBAFF_H264_[8u|16u]_C1IR
  • ippiFilterDeblockingChroma_VerEdge_MBAFF_H264_[8u|16u]_C1IR

The MBAFF versions of the function filter 16x8 blocks instead of 16x16 and are intended for use with interlaced video.

Slightly different variations of some of these functions take a structure of parameters instead of pushing all of the parameters on the stack. These provide a slight performance improvement due to decreased stack usage.

Listing 4 shows a code snippet that executes a deblocking filter. The behavior of the filters are determined by the alpha, beta, and clipping thresholds, and the filter strength arrays. The alpha parameter is the threshold for gradient across the edges, while the beta parameter is the threshold for gradient on one side of an edge. The clipping thresholds, held in the array Clipping and called tc0 in the standard, limit the effect of the filter. The threshold parameters are based on fixed tables, indexed by the Quality Parameter (QP) plus a tuning factor. The strength parameter pStrength, which is referred to as bS in the standard, affects the deblocking filter in a number of ways, including the basic algorithm. Both the tables and the formulas used in to calculate the indices are taken from the H.264 standard.

For simplicity, this code uses simple wrapper functions around each of the Intel IPP functions. The wrappers adapt the arguments and provide a uniform prototype for all the deblocking filters, but do not do any computation. Since they have a uniform prototype, the function calls them indirectly, according to a table set elsewhere.

Ipp8u BETA_TABLE[52] =
{
  0,  0,  0,  0,  0,  0,  0,  0,
  0,  0,  0,  0,  0,  0,  0,  0,
  2,  2,  2,  3,  3,  3,  3,  4,
  4,  4,  6,  6,  7,  7,  8,  8,
  9,  9,  10, 10, 11, 11, 12, 12,
  13, 13, 14, 14, 15, 15, 16, 16,
  17, 17, 18, 18
};

.{
  ...
  IppStatus ( *(IppDeblocking[])) (Ipp8u *, Ipp32s, Ipp8u *,
    Ipp8u *, Ipp8u *, Ipp8u *, Ipp32s ) =
  {
    &(FilterDeblockingLuma_VerEdge),
    &(FilterDeblockingLuma_HorEdge),
    &(FilterDeblockingChroma_VerEdge),
    &(FilterDeblockingChroma_HorEdge),
    &(FilterDeblockingChroma422_VerEdge),
    &(FilterDeblockingChroma422_HorEdge),
    &(FilterDeblockingChroma444_VerEdge),
    &(FilterDeblockingChroma444_HorEdge),
    &(FilterDeblockingLuma_VerEdge_MBAFF),
    &(FilterDeblockingChroma_VerEdge_MBAFF)
  };

  IppStatus ( *(IppDeblocking16u[])) (Ipp16u *, Ipp32s, Ipp8u *,
    Ipp8u *, Ipp8u *, Ipp8u *, Ipp32s ) =
  {
    &(FilterDeblockingLuma_VerEdge),
    &(FilterDeblockingLuma_HorEdge),
    &(FilterDeblockingChroma_VerEdge),
    &(FilterDeblockingChroma_HorEdge),
    &(FilterDeblockingChroma422_VerEdge),
    &(FilterDeblockingChroma422_HorEdge),
    &(FilterDeblockingChroma444_VerEdge),
    &(FilterDeblockingChroma444_HorEdge),
    &(FilterDeblockingLuma_VerEdge_MBAFF),
    &(FilterDeblockingChroma_VerEdge_MBAFF)
  };

  // internal edge variables
  QP = pmq_QP;

  index = IClip(0, 51, QP + BetaOffset);
  Beta[1] = (Ipp8u) (BETA_TABLE[index]);

  index = IClip(0, 51, QP + AlphaC0Offset);
  Alpha[1] = (Ipp8u) (ALPHA_TABLE[index]);
  pClipTab = CLIP_TAB[index];

  // create clipping values
  {
    Ipp32s edge;

    for (edge = 1;edge < 4;edge += 1)
    {
      if (*((Ipp32u *) (pStrength + edge * 4)))
      {
        // create clipping values
        Clipping[edge * 4 + 0] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 0]]);
        Clipping[edge * 4 + 1] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 1]]);
        Clipping[edge * 4 + 2] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 2]]);
        Clipping[edge * 4 + 3] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 3]]);
      }
    }
  }

  if (pParams->bitDepthLuma > 8)
  {
    IppDeblocking16u[dir]((Ipp16u*)pY,
      pic_pitch,
      Alpha,
      Beta,
      Clipping,
      pStrength,
      pParams->bitDepthLuma);
  }
  else
  {
    IppDeblocking[dir](pY,
      pic_pitch,
      Alpha,
      Beta,
      Clipping,
      pStrength,
      pParams->bitDepthLuma);
  }
}
Listing 4: Deblocking Filters in H.264

Threading and Video Coding

H.264 and MPEG-4 in general are amenable to threading. Listing 5 shows the key piece of code from the Intel IPP codec sample for H.264 that uses one OpenMP pragma to parallelize this encoder.

The key aspect of this code is the slice. The slice is defined as an independent segment of the image, a segment that neither uses other video slices for reference in prediction is used for reference by other video slices. That makes it the perfect level for parallelization, as the codec can process multiple slices simultaneously and not be forced into serial mode by motion compensation.

template <class PixType, class CoeffsType> Status
  H264CoreEncoder<PixType,CoeffsType>::CompressFrame(
   EnumPicCodType &    ePictureType,
   EnumPicClass   &    ePic_Class,
   MediaData*        dst)
{
  Status      status = UMC_OK;
  Ipp32s  slice;

  for (m_field_index=0;
    m_field_index <= (Ipp8u)
    (m_pCurrentFrame->m_PictureStructureForDec< FRM_STRUCTURE); 
	m_field_index++)
  {
    ...

#if defined _OPENMP
      vm_thread_priority mainTreadPriority = vm_get_current_thread_priority();
#pragma omp parallel for private(slice)
#endif // _OPENMP
      for (slice = (Ipp32s)m_info.num_slices*m_field_index;
           slice < m_info.num_slices*(m_field_index+1);
           slice++)
      {
#if defined _OPENMP
        vm_set_current_thread_priority(mainTreadPriority);
#endif // _OPENMP

        UpdateRefPicList(m_Slices + slice,
          m_pCurrentFrame->GetRefPicLists(slice),
          m_SliceHeader, &m_ReorderInfoL0,
          &m_ReorderInfoL1);

        // Compress one slice
        if (m_is_cur_pic_afrm)
          m_Slices[slice].status =
            Compress_Slice_MBAFF(m_Slices + slice);
        else{
          m_Slices[slice].status =
            Compress_Slice(m_Slices + slice,
            slice == m_info.num_slices*m_field_index);
        }
      ...
      }
Listing 5: Threading the H.264 Encoder


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.