Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

MPEG-2 and Video Compression


MPEG-2 in Intel IPP. The Intel Integrated Performance Primitives (Intel IPP) provide a very efficient sample encoder and decoder for MPEG-2. Due to the number of variants, it is only a sample and not a compliant codec.

Each side of the codec includes hundreds of Intel IPP function calls. The bulk of the code in the sample is for bit stream parsing and data manipulation, but the bulk of the time is spent decoding the pixels. For this reason, almost all of the Intel IPP calls are concentrated in the pixel decoding blocks. In particular, the key high-level functions are the member functions of the class MPEG2VideoDecoderBase:

DecodeSlice_FrameI_420
DecodeSlice_FramePB_420
DecodeSlice_FieldPB_420
DecodeSlice_FrameI_422
DecodeSlice_FramePB_422
DecodeSlice_FieldPB_422

These functions decode the structure of the image, then pass the responsiblility for decoding individual blocks into a function such as ippiDecodeIntra8x8IDCT_MPEG2_1u8u. Listing 1 shows the key portions of two of these functions.

Status MPEG2VideoDecoderBase::DecodeSlice_FrameI_420(
   IppVideoContext *video)
{
 ...
    DECODE_VLC(macroblock_type, video->bs, vlcMBType[0]);

    if (load_dct_type) {
      GET_1BIT(video->bs, dct_type);
    }

    if (macroblock_type & IPPVC_MB_QUANT)
    {
      DECODE_QUANTIZER_SCALE(video->bs,
        video->cur_q_scale);
    }

    if (PictureHeader.concealment_motion_vectors)
    {
      if (PictureHeader.picture_structure !=
        IPPVC_FRAME_PICTURE) {
        SKIP_BITS(video->bs, 1);
      }
      mv_decode(0, 0, video);
      SKIP_BITS(video->bs, 1);
    }

    RECONSTRUCT_INTRA_MB_420(video->bs, dct_type);
  }
}//DecodeSlice_FrameI_420
#define RECONSTRUCT_INTRA_MB_420(BITSTREAM, DCT_TYPE) \
  RECONSTRUCT_INTRA_MB(BITSTREAM, 6, DCT_TYPE)

#define RECONSTRUCT_INTRA_MB(BITSTREAM, NUM_BLK, DCT_TYPE) \
{                                                          \                   ...
  for (blk = 0; blk > NUM_BLK; blk++) {                    \
    sts = ippiDecodeIntra8x8IDCT_MPEG2_1u8u( ... );        \
  }                                                        \
}

Status MPEG2VideoDecoderBase::DecodeSlice_FramePB_420(
   IppVideoContext *video)
{
  ...
      if (video->prediction_type == IPPVC_MC_DP) {
        mc_dualprime_frame_420(video);
      } else {
        mc_frame_forward_420(video);
        if (video->macroblock_motion_backward) {
          mc_frame_backward_add_420(video);
        }
      }
    } else {
      if (video->macroblock_motion_backward) {
        mc_frame_backward_420(video);
      } else {
        RESET_PMV(video->PMV)
        mc_frame_forward0_420(video);
      }
    }

    if (macroblock_type & IPPVC_MB_PATTERN) {
      RECONSTRUCT_INTER_MB_420(video->bs, dct_type);
    }
  }

  return UMC_OK;
}//DecodeSlice_FramePB_420

void MPEG2VideoDecoderBase::mc_frame_forward0_422(
   IppVideoContext *video)
{
    MC_FORWARD0(16, frame_buffer.Y_comp_pitch,
        frame_buffer.U_comp_pitch);
}

 #define MC_FORWARD0(H, PITCH_L, PITCH_C)                \
  ...
  ippiCopy16x16_8u_C1R(ref_Y_data + offset_l, PITCH_L,   \
    cur_Y_data + offset_l, PITCH_L);                     \
  ippiCopy8x##H##_8u_C1R(ref_U_data + offset_c, PITCH_C, \
    cur_U_data + offset_c, PITCH_C);                     \
  ippiCopy8x##H##_8u_C1R(ref_V_data + offset_c, PITCH_C, \
    cur_V_data + offset_c, PITCH_C);
#define RECONSTRUCT_INTER_MB_420(BITSTREAM, DCT_TYPE) \
  RECONSTRUCT_INTER_MB(BITSTREAM, 6, DCT_TYPE)

#define RECONSTRUCT_INTER_MB(BITSTREAM, NUM_BLK, DCT_TYPE)   \
  ...
  for (blk = 0; blk < NUM_BLK; blk++) { \
    ...
    sts = ippiDecodeInter8x8IDCTAdd_MPEG2_1u8u(...);          
Listing 1: Structure of MPEG-2 Intra Macroblock Decoding

For decoding, two Intel IPP function groups execute most of the decoding pipeline. Between them they implement a large portion of an MPEG-2 decoder, at least for intra blocks.

The first group is ippiReconstructDCTBlock_MPEG2 for non-intra blocks and ippiReconstructDCTBlockIntra_MPEG2 for intra blocks. These functions decode Huffman data, rearrange it, and dequantize it. The source is the Huffman-encoded bit stream pointing to the top of a block and the destination is an 8x8 block of consecutive DCT coefficients.

The Huffman decoding uses separate tables for AC and DC codes, formatted in the appropriate Intel IPP Spec structure. The scan matrix argument specifies the zigzag pattern to be used. The functions also take two arguments for the quantization, a matrix and a scale factor. Each element is multiplied by the corresponding element in the quantization matrix, then by the global scale factor.

The function ReconstructDCTBlockIntra also takes two arguments for processing the DC coefficient: the reference value and the shift. The function adds the reference value, which is often taken from the last block, to the DC coefficient. The DC coefficient is shifted by the shift argument, which should be zero to three bits as indicated above.

The second main function is the inverse DCT. The two most useful DCT functions are ippiDCT8x8InvLSClip_16s8u_C1R for intra blocks and ippiDCT8x8Inv_16s_C1R for non-intra blocks. The versions without level-shift and clipping can also be used. This former function inverts the DCT on an 8x8 block then converts the data to Ipp8u with a level shift. The output values are pixels. The latter function inverts the DCT and leaves the result in Ipp16s; the output values are difference values. The decoder must then add these difference values to the motion-compensated reference block.

Listing 2 shows these function groups decoding a 4:2:0 intra macroblock. The input is a bit stream and several pre-calculated tables. The DCT outputs the pixel data directly in an image plane. The four blocks of Y data are arrayed in a 2x2 square in that image, and the U and V blocks are placed in analogous locations in the U and V planes. This output can be displayed directly by the correct display, or the U and V planes can be upsampled to make a YCbCr 4:4:4 image, or the three planes can be converted by other Intel IPP functions to RGB for display.

ippiReconstructDCTBlockIntra_MPEG2_32s(
		&video->bitstream_current_data,
		&video->bitstream_bit_ptr,
		pContext->vlcTables.ippTableB5a,
		pContext->Table_RL,
		scan_1[pContext->PictureHeader.alternate_scan],
		q_scale[pContext->PictureHeader.q_scale_type]
			[pContext->quantizer_scale],
		video->curr_intra_quantizer_matrix,
		&pContext->slice.dct_dc_y_past,
		pContext->curr_intra_dc_multi,
		pContext->block.idct, &dummy);

	ippiReconstructDCTBlockIntra_MPEG2_32s(
		...
		pContext->block.idct+64, &dummy);
	...
	// Repeat two more times for other Y blocks
	ippiReconstructDCTBlockIntra_MPEG2_32s(...)
		...

	VIDEO_FRAME_BUFFER* frame =
		&video->frame_buffer.frame_p_c_n
			[video->frame_buffer.curr_index];

	// Inverse DCT and place in 16x16 block of image
	ippiDCT8x8InvLSClip_16s8u_C1R(
		pContext->block.idct,
		frame->Y_comp_data + pContext->offset_l,
		pitch_Y, 0, 0, 255);
	ippiDCT8x8InvLSClip_16s8u_C1R(
		pContext->block.idct,
		frame->Y_comp_data + pContext->offset_l + 8,
		pitch_Y, 0, 0, 255);
	ippiDCT8x8InvLSClip_16s8u_C1R(
		pContext->block.idct,
		frame->Y_comp_data + pContext->offset_l + 8*pitch_Y,
		pitch_Y, 0, 0, 255);
	ippiDCT8x8InvLSClip_16s8u_C1R(
		pContext->block.idct,
		frame->Y_comp_data +
			pContext->offset_l + 8*pitch_Y + 8,
		pitch_Y, 0, 0, 255);
 ...
	ippiReconstructDCTBlockIntra_MPEG2_32s(
		&video->bitstream_current_data,
		&video->bitstream_bit_ptr,
		pContext->vlcTables.ippTableB5b,
		pContext->Table_RL,
		scan_1[pContext->PictureHeader.alternate_scan],
		q_scale[pContext->PictureHeader.q_scale_type]
			[pContext->quantizer_scale],
		video->curr_chroma_intra_quantizer_matrix,
		&pContext->slice.dct_dc_cb_past,
		pContext->curr_intra_dc_multi,
		pContext->block.idct, &i1);

	ippiReconstructDCTBlockIntra_MPEG2_32s(
		...
		&pContext->slice.dct_dc_cr_past,
		pContext->curr_intra_dc_multi,
		pContext->block.idct + 64,&i2);

	ippiDCT8x8InvLSClip_16s8u_C1R (
		pContext->block.idct,
		frame->U_comp_data + pContext->offset_c,
		pitch_UV, 0,0,255);

	ippiDCT8x8InvLSClip_16s8u_C1R (
		pContext->block.idct + 64,
		frame->V_comp_data + pContext->offset_c,
		pitch_UV, 0,0,255);
Listing 2: Decoding an MPEG-2 Intra Macroblock

The dummy parameter to the first ippiReconstructDCTBlock call is not used here but can be used for optimization. If the value returned is 1, then only the DC coefficient is nonzero and the inverse DCT can be skipped. If it is less than 10, then all the nonzero coefficients are in the first 4x4 block, and a 4x4 inverse DCT can be used.

The ippiDCT8x8Inv_16s8u_C1R functions could be called instead of the ippiDCT8x8InvLSClip_16s8u_C1R because data is clipped to the 0-255 range by default.

In the non-intra case, the pointer to the quantization matrix can be 0. In that case, the default matrices will be used.

Listing 3 shows another approach to decoding, from the MPEG-2 sample for Intel IPP 5.2. Instead of using the ippiReconstructDCTBlock function for decoding, it implements a pseudo-IPP function called

ippiDecodeIntra8x8IDCT_MPEG2_1u8u. This function does almost the entire decoding pipeline, from VL coding through motion compensation.

MP2_FUNC(IppStatus, ippiDecodeInter8x8IDCTAdd_MPEG2_1u8u, (
    Ipp8u**                            BitStream_curr_ptr,
    Ipp32s*                            BitStream_bit_offset,
    IppiDecodeInterSpec_MPEG2*         pQuantSpec,
    Ipp32s                             quant,
    Ipp8u*                             pSrcDst,
    Ipp32s                             srcDstStep))
{

    // VLC decode & dequantize for one block
    for (;;) {
      if ((code & 0xc0000000) == 0x80000000) {
        break;
      } else if (code >= 0x08000000) {
        tbl = MPEG2_VLC_TAB1[UHBITS(code - 0x08000000, 8)];
common:
        i++;
        UNPACK_VLC1(tbl, run, val, len)

        i += run;
        i &= 63; // just in case
        j = scanMatrix[i];

        q = pQuantMatrix[j];
        val = val * quant;
        val = (val * q) >> 5;
        sign = SHBITS(code << len, 1);
        APPLY_SIGN(val, sign);
        SKIP_BITS(BS, (len+1));
        pDstBlock[j] = val;
        mask ^= val;
        SHOW_HI9BITS(BS, code);
        continue;
      } else if (code >= 0x04000000) {
      ...
      }
    }

  ...

  pDstBlock[63] ^= mask & 1;
  SKIP_BITS(BS, 2);
  COPY_BITSTREAM(*BitStream, BS)

  IDCT_INTER(pDstBlock, i, idct, pSrcDst, srcDstStep);

  return ippStsOk;
}

#define FUNC_DCT8x8      ippiDCT8x8Inv_16s_C1
#define FUNC_DCT4x4      ippiDCT8x8Inv_4x4_16s_C1
#define FUNC_DCT2x2      ippiDCT8x8Inv_2x2_16s_C1
#define FUNC_DCT8x8Intra ippiDCT8x8Inv_16s8u_C1R
#define FUNC_ADD8x8      ippiAdd8x8_16s8u_C1IRS 

#define IDCT_INTER(SRC, NUM, BUFF, DST, STEP)      \
  if (NUM < 10) {                                  \
    if (!NUM) {                                    \
      IDCTAdd_1x1to8x8(SRC[0], DST, STEP);         \
    } else                                         \
    IDCT_INTER_1x4(SRC, NUM, DST, STEP)            \
    /*if (NUM < 2) {                                 \
      FUNC_DCT2x2(SRC, BUFF);                      \
      FUNC_ADD8x8(BUFF, 16, DST, STEP);            \
    } else*/ {                                       \
      FUNC_DCT4x4(SRC, BUFF);                      \
      FUNC_ADD8x8(BUFF, 16, DST, STEP);            \
    }                                              \
  } else {                                         \
    FUNC_DCT8x8(SRC, BUFF);                        \
    FUNC_ADD8x8(BUFF, 16, DST, STEP);              \
  }
Listing 3: Alternate MPEG-2 Decoding on an Inter Macroblock.

Within this function, much of the decoding is done within C++, largely using macros and state logic. The Huffman decoding in this sample is done in C++ using macros. The quantization is done in C++, on each sample as it is decoded. The motion compensation is done along with the DCT in one of the DCT macros.

This function calls uses several DCT functions. Most of the DCTs are done by two useful functions, ippiDCT8x8Inv_16s8u_C1R and ippiDCT8x8Inv_16s_C1R for intra blocks and inter blocks, respectively. The former function converts the output to Ipp8u, because for intra blocks those values are pixels. The latter function leaves the result in Ipp16s, because the output values are difference values to be added to the motion-compensated reference block. The sample also uses other DCT function, such as the specialized function ippiDCT8x8Inv_AANTransposed, that assumes that the samples are transposed and in zigzag order, and accommodates implicit zero coefficients at the end. For blocks that are mostly zeros, the decoder also uses the function ippiDCT8x8Inv_4x4_16s_C1.

The Intel IPP DCT functions also support an alternative layout for YUV data, a hybrid layout in which there are two planes, Y and UV. The UV plane consists of U and V data interleaved. In this case, there is one 16x8 block of UV data per macroblock. The Intel IPP functions ippiDCT8x8Inv_AANTransposed_16s_P2C2R supporting inter frames and ippiDCT8x8Inv_AANTransposed_16s8u_P2C2R for intra frames support this alternative layout. The ippiMC16x8UV_8u_C1 and ippiMC16x8BUV_8u_C1 functions support motion compensation on this layout.

On the encoding side, functions are mostly analogous to each of the decode functions listed above. For intra blocks, the forward DCT function ippiDCT8x8Fwd_8u16s_C1R converts a block of Ipp8u pixels into Ipp16s DCT coefficients. Then the function ippiQuantIntra_MPEG2 performs quantization, and the function ippiPutIntraBlock calculates the run-level pairs and Huffman encodes them. The parameters for these last two functions are very similar to those for their decoding counterparts.

For inter blocks, the function ippiDCT8x8Fwd_16s_C1RippiQuant_MPEG2 quantizes, and the function

ippiPutNonIntraBlock calculates and encodes the run-level pairs.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.