Channels ▼
RSS

Design

DSP Meets Wireless Communications


The authors are Intel engineers. Courtesy Intel Corp. All rights reserved.


Digital signal and image processing (DSP) is ubiquitous: From digital cameras to cell phones, HDTV to DVDs, satellite radio to medical imaging. The modern world is increasingly dependent on DSP algorithms. Although, traditionally, special-purpose silicon devices such as digital signal processors, ASICs, or FPGAs are used for data manipulation, general-purpose processors (GPPs) can now also be used for DSP workloads. Code is generally easier and more cost-effective to develop and support on GPPs than on large DSPs or FPGAs. GPPs are also able to combine general purpose processing with digital signal processing in the same chip, a major advantage for many complex algorithms. The Intel processor microarchitecture, instruction set, and performance libraries have features and benefits that can be exploited to deliver the performance and capability required by DSP applications. This article explores the main differences between traditional DSPs and modern Intel general-purpose processor architectures.

Vectorization is one example of how GPPs can meet DSP requirements. Although DSP algorithms tend to be mathematically intensive, they are often fairly simple in concept. Filters and Fast Fourier Transforms (FFTs), for example, can be implemented using simple multiply and accumulate instructions. Modern GPPs use Single Instruction Multiple Data (SIMD) techniques to increase their performance on these types of low-level DSP functions. Current Intel Core processor family and Intel Xeon processor have 16 128-bit vector registers that can be configured as groups of 16, 8, 4, or 2 samples depending on the data format and precision required. For single-precision (32-bit) floating point SIMD processing, for example, four floating point (FP) numbers which need to be multiplied by a second value are loaded into vector register 1 with the multiplicand(s) in register 2. Then the multiply operation is executed on all four numbers in a single processor clock cycle. Current Intel Core2 processor family and Intel Xeon processor have a 4-wide instruction pipeline with two FP Arithmetic Logical Units, so potentially 8 single-precision FP operations can be done per clock cycle per core. This number will increase to 16 operations per clock when the Intel Advanced Vector Extensions (Intel AVX) Instruction Set Architecture debuts in 2010 "Sandy Bridge" generation processors since AVX SIMD registers will be 256 bits wide.

Parallelization is one example of how GPPs can meet DSP requirements. The Intel architecture, as a multicore architecture, is suited for executing multiple threads in parallel. In terms of DSP programming, there are several approaches for achieving parallelism:

  • Pipelined execution: The algorithm is divided in stages and each of these stages is assigned to a different thread.
  • Concurrent execution: The input data is partitioned, and each thread processes its own portion of the input data through the whole algorithm. This is only possible if the functionality of the algorithm is not compromised.

Both approaches can also be combined in order to maximize performance and efficient resource utilization.

When evaluating parallelism, the programmer should also consider cache hierarchy. For maximum throughput, each thread should ideally have its input/output data fit within local caches (L1, L2), minimizing cache trashing due to inter-core coherency overheads. On every stage of the algorithm, threads should guarantee that their output data is contiguously stored in blocks with size that is a multiple of the internal cache line width. Inter-thread data dependencies should be minimized and pipelined to reduce algorithm lock-ups.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video