Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Multi-core MPEG-4 Video Encode Partitioning


Partitioning video processing algorithms onto multi-core architectures has been researched for decades, and over this time several techniques of varying efficiency have been developed to divide up the work among the processors. Let's take a closer look at some of these techniques, and see how video processing poses unique challenges to the multi-core processor.

Data Partitioning
One partitioning technique commonly used is called data partitioning, which relies on the ability of processing data blocks in parallel. This technique is most commonly and easily applied at a high-granularity, each data block being a channel, a frame, or a slice, where a slice refers to a large area of a frame that is processed independently from the rest of the frame. Each data block is processed in parallel on a different processor. A master processor is generally responsible for ensuring synchronization among the processors and combining the results as needed.

Applying this partitioning approach at a high-granularity presents the advantage of requiring only a minimal amount of inter-processor communication, since each block can be processed independently from the others. This approach is also easy to implement as only few modifications to the existing single-core oriented reference code are required in order to run in parallel on all processors and produce functional output. However, this technique has also inherent problems. First, it is difficult to ensure proper load balancing among processors as video codec algorithms have data-dependent processing requirements. For example, one slice of a frame (or video sequence) may contain scenes with small amount of movement and details (e.g., a uniform background such as a wall or the sky), resulting in much lower processing requirements than another slice containing high-motion scenes with finer details (e.g., the face of a person talking).

Because of these discrepancies, some processors remain idle for long periods of time while others are busy processing the most computationally intensive scenes: this unbalance translates as a waste of the processing resources and suboptimal performance. Second, simple data partitioning results in non-scalable implementations, since there is little flexibility on the number of blocks in which the data can be divided. For example, a 16-channel encoder may fit nicely on a 16-processor architecture by assigning one processor to each channel, but the code will need to be reworked significantly if another application needs to run in parallel on that architecture and mobilizes one or more processors for extended periods of time. Dividing frames into slices offer slightly more flexibility as the size and number of slices can generally be adjusted without requiring extensive code changes. Unfortunately, dividing a frame into too many slices deteriorates the efficiency of the compression algorithms.

Data Pipelining
Another partitioning technique is called data pipelining or functional partitioning. This technique consists of assigning each processor a different processing block, such as motion estimation or texture encoding in the case of video encoders. With this approach, the data is processed in a pipelined fashion: one processor applies the first processing block to the data and passes on its output to another processor, which applies the second processing block to the modified data, and so on. Using the same approach, the most challenging processing blocks can be subdivided and assigned to multiple processors while simpler ones can be handled by one processor alone to achieve better load balancing.

Unfortunately, this second partitioning technique also introduces multiple challenges. First, the assignment of each processing block to a processor is generally done at compile time: this is the simplest approach and sometimes the only approach that is possible because of the limitations of the architecture or the lack of a multi-core operating system running on the architecture. Assigning roles to each processor at compile-time does not allow for proper load balancing. For example, motion estimation processing requirements vary greatly depending on the video streams being processed: still and low-motion sequences result in lower processing requirements for the motion estimation block than high-motion ones.

As a result, processors assigned to the motion estimation algorithm are likely to be underused with low-motion scenes while they will be the bottlenecks with high-motion scenes. Second, splitting an algorithm into multiple processing blocks run on different processors introduces inter-processor communications that may affect performance by straining memory resources or causing processors to stall as a result of data starvation. Finally, a simple data-pipelining approach suffers the same limitations that were mentioned about data partitioning earlier. Algorithms cannot be divided into an arbitrary number of blocks to match perfectly the number of processors available on a multi-core architecture, and assigning processing blocks to individual processors at compile time results in non-scalable implementations.

Next: More Efficient Partitioning Strategies and the CT3600 MDSP family


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.