Channels ▼
RSS

Design

Optimizing Video Encoding using Threads and Parallelism


Multithreading Overhead

In summary, having the number of threads equal to the number of logical processors strikes the best balance between speed-up and parallelism. But, what happens to the performance when the number of threads is greater or less than the number of logical processors? Figure 11 shows that the speed-up changes along with the number of threads for an implementation using two slice queues. The speed increases along with increasing of the number of threads, reaching peak performance when the number of threads equals to the number of logical processors.

Figure 11: Speedups Versus Number of Threads on a 2-way Dell System.

An interesting observation is that the speedup is essentially flat, or it drops only slightly when the number of threads is greater than the number of logical processors. Thus, the overhead due to threading is minor. In other words, the multithreaded code generated by the compiler exploits effective parallelism efficiently, and the overhead of the multithreaded run-time library is small. Furthermore, the multithreaded H.264 encoder should have good scalability for mediumscale multiprocessor systems, such as the one shown in Figure 12, because the performance is not sensitive to the number of threads.

Figure 12: Speedups Versus Number of Threads on 4-way IBM x360 System.

Further Performance Tuning

In this explanation of the first parallel implementation of the H.264 encoder on the multithreading architecture, you got one explanation of different tradeoffs in video quality and parallelization. In other studies, researchers took the most straightforward approach to encoding the video sequences either by pictures or by slices. Our approach is slightly more complicated in exploiting both the slice level and frame-level parallelism.

Even when the expected performance gain is achieved, one can always find some further work to do. In this case, you could analyze the performance impact from different image resolutions. While the resolution of source image can scale from QCIF, CIF, SD to HDTV, most of our current analysis focused on the CIF resolution. Figure 5 shows that the increased speed of SD (720x480) format is slightly less than that of CIF (352x288) format. While the speedup is determined by factors such as synchronization and degree of parallelism, Figure 13 shows that the number of synchronizations per second during encoding SD video is less than that of encoding CIF video. Furthermore, SD has a higher degree of parallelism. We could do better to understand the reasons that the speedup of encoding higher resolution video is less than that of lower resolution video.

Figure 13: Speedups Versus Number of Threads on 4-way IBM x360 System.

Summary

As the emerging codec standard becomes more complex, the encoding and decoding processes require much more computation power than most existing standards. The H.264 standard includes a number of new features and requires much more computation than most existing standards, such as MPEG-2 and MPEG-4. Even after media instruction optimization, the H.264 encoder at CIF resolution still is not fast enough to meet the expectation of real-time video processing. Thus, exploiting thread-level parallelism to improve the performance of H.264 encoders is becoming more attractive.

The case study presented here shows that multithreading based on the OpenMP programming model is a simple, yet effective way to exploit parallelism that only requires a few additional pragmas in the serial code. Developers can rely on the compiler to convert the serial code to multithreaded code automatically via adding OpenMP pragmas. The performance results have shown that the code generated by the Intel compiler delivers optimally increased speed over the well-optimized sequential code on the architecture with Hyper-Threading Technology, often boosting performance by 20 percent on top of native parallel speedups, approximately 4x without HT in this case, with very little additional cost.

In summary, when parallelizing an application, remember the following key points:

  • Understand the application to make the best choice on task and data decomposition schemes for achieving optimal scalability and load-balancing.
  • Carefully choose the granularity of the parallelism such as frame level and slice-level parallelism to exploit a right amount of parallelism with minimum synchronization overhead.
  • Use tools such as the Intel VTune Performance Analyzer and Intel Thread Profiler to measure the performance at various levels such as micro-architecture metric, and the breakdown time of thread busy and waiting time to understand your performance gain or loss and identify further tuning headroom for performance improvements.


This article is based on material found in book The Software Optimization Cookbook, Second Edition by Richard Gerber, Aart J.C. Bik, Kevin B. Smith, and Xinmin Tian. (http://www.intel.com/intelpress/sum_swcb2.htm)


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video