Channels ▼
RSS

Tools

Optimizing Video Encoding using Threads and Parallelism


Implementation Using Two Slice Queues

The H.264 encoder is divided into three parts: input pre-processing, encoding, and output post-processing. Input pre-processing reads uncompressed images, performs some preliminary processes, and then issues the images to encoding threads. The pre-processed images are placed in a buffer, called the "image buffer". Output processing checks the encoding status of each frame and commits the encoded result to the output bit-stream sequentially. After that, the entries in the image buffer are reused to prepare the image for encoding. Although the input and output processes of the encoder must be sequential due to the inherent parallelism of the encoder, the computation complexity of input and output processes is insignificant compared to the encode process. Therefore, you can use one thread to handle the input and output processes. This thread becomes the master thread in charge of checking all the data dependency.

You would use another buffer, called "slice buffer", to exploit the parallelism among slices. After each image is pre-processed, the slices of the image go into the slice buffer. The slices placed in the slice buffer are independent and ready for encoding; the readiness of reference frames is checked during the input process. In this case, you can encode these slices out of order. To distinguish the priority differences between the slices of B frames and the slices of I or P frames, use two separate slice queues to handle them. The pseudocode in Example 1 implements this two-slice model.


// Pesudo-code of Threaded H.264 Encoder using OpenMP
omp_set_nested( # of encoding thread + 1)
#pragma omp parallel sections
{
#pragma omp section
   {
      while ( there is frame to encode )
      {
         if ( there is free entry in image buffer )
            issue new frame to image buffer
         else if ( there are frame encoded in image buffer )
            commit the encoded frame, release the entry
         else // dependency are handled here
            wait;
      }
   }
#pragma omp section
   {
   #pragma omp parallel num_threads(# of encoding thread)
      {
         while ( 1 ) {
            if ( there is slice in slice queue 0)
               // higher priority for I/P-frames
               Encode one slice
            else if ( there is slice in slice queue 1)
               // lower priority for B-frames
               encode one slice
            else if ( all frames are encoded )
               exit;
            else
               // wait for the main thread to put more slices
               wait
         }
      }
   }
}

Example 1: Slice-Queue Model for Parallelism in the H.264 Encoder

Figure 4 shows how the video stream is processed by the final multithreading implementation of a parallelized H.264 encoder. In the code segment, one thread processes both the input and the output, in order, and other threads encode slices out of order.

Figure 4: Implementation with Image and Slice Buffers.

Implementation Using Task Queuing Model

The implementation in Example 1 uses the OpenMP pragma, making the structure of the parallel code very different from that of a sequential code. A second proposed implementation uses the taskqueuing model that is supported by the Intel C++ Compiler.

Essentially, for any given program with taskqueuing constructs, a team of threads is created by the run-time library when the main thread encounters a parallel region. Figure 5 shows the taskqueuing execution model. The run-time thread scheduler chooses one thread (TK) to execute initially from all the threads that encounter a taskq pragma. All the other threads wait for work to be put on the work queue. Conceptually, the taskq pragma triggers this sequence of actions:

  1. Causes an empty queue to be created by the chosen thread TK
  2. Enqueues each task that it encounters
  3. Executes the code inside the taskq block as a single thread

The task pragma specifies a unit of work, potentially to be executed by a different thread. When a task pragma is encountered lexically within a taskq block, the code inside the task block is placed on the queue associated with the taskq pragma. The conceptual queue is disbanded when all work enqueued on it finishes and the end of the taskq block is reached.

Figure 5: Taskqueuing Execution Model.

The first proposed multithreaded H.264 scheme uses two FIFO buffers: an image buffer and a slice buffer. The main thread is in charge of three activities:

  • Moving raw images into the image buffer when the image buffer has space
  • Moving slices of the image buffer into slice buffers when the slice buffer has space and the image is not yet dispatched
  • Moving the encoded images out the image buffer when the image is encoded

The working threads are in charge of encoding new slices when a slice is waiting in the slice buffer to be encoded. All these operations are synchronized through the image buffers. Hence, you would find it natural to use the taskqueuing model supported by the Intel compiler.

The code segment in Example 2 shows the pseudo-code of the multithreading of the H.264 encoder using the taskqueuing model. This multithreaded source code is closer to the way you would write singlethread code. The only difference is the pragma, which is a key characteristic of OpenMP. Furthermore, in this scheme, you no longer have a control thread, only a number of working threads in total.



// Pesudo-code of Threaded H.264 Encoder using Taskqueuing
#pragma intel omp parallel taskq
{
   while ( there is frame to encode ) {
      if ( there is no free entry in image buffer )
         (1) commit the encoded frame;
         (2) release the entry;
         (3) load the original picture to memory;
         (4) prepare for encoding;
      for (all slice in this frame) {
         #pragma intel omp task
         {
            encode one slice;
         }
      }
   }
}

Example 2: Task-Queue Model for Theading the H.264 Encoder


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video