Channels ▼
RSS

Parallel

DSP Meets Wireless Communications


Overall Performance Results for the Complete Turbo Encoder

From Tables 8 and 9, the internal interleaver takes 4.99 cycles per byte, using 4 independent threads for an input block size of 6144 bits, while the encoder, which uses 2 threads, takes 4.76 cycles per byte. As there is no inter-block dependence it is possible to run two encoders in parallel on the reference platform: Dual Intel Core i7 Processor 2112 MHz (8 MB Cache/CPU, 1.5 GB DDR3 800MHz/CPU. 64-bit CentOS 5.0, Intel C++ Compiler 10.0.0.64, 80 GB HD Samsung 5400 rpm.

As a result, a 10-ms frame (57 Mbps) is encoded in 159.1 microseconds corresponding to a total CPU usage of 1.59 percent.

Channel Estimation

On the next generation of mobile wireless standards the estimation of the channel characteristics is necessary to provide high data throughputs. LTE includes a number of reference signals in its data frame that are used to compute the estimation, as in Figure 2.

Figure 2: Spacing of Reference signals on each antenna (Source: 3GPP LTE Standard)

These reference signals are sent every six subcarriers with alternate frequency offsets. They are sent on the first and fourth OFDM symbols of each slot, so two channel estimations are computed per slot.

The estimation consists of a time average of the current reference frame and the 5 previous ones, in order to minimize noise distortion.

Figure 3 represents the high-level view of the channel estimator, comprising a complex reciprocal operation (rcp(z)), a complex multiplication per each set of reference values, an averaging operator (∑), and a polyphase interpolator (H(z)).

Figure 3: High-level view of the channel estimator (Source: Intel Corporation, 2009)

In terms of computational complexity per sample:

  • Reciprocal calculation: 6 multiplications, 1 division and 1 addition.
  • Complex multiplication: 4 multiplications and 2 additions.
  • Averaging operation: 6 additions and 1 multiplication.
  • Polyphase interpolator: 6 multiplications and 3 additions.
  • Total number of operations: 30.

For a 10-ms full 4x4 MIMO, 20-MHz frame, the algorithm computes 120 channel estimations, where only 340 samples per frame are used. Multiplying this by the total number of operations per sample we get a total of 1.224 MFLOP per frame.

Implementation

The input data parameters are assumed as described in Table 4.

Table 4: Input data format for channel estimation

Only the complex multiplications and reciprocals are computed in floating point. Reciprocals in particular are implemented with SSE intrinsics for a higher throughput. The performance results in CPU cycles per reference input sample are presented in Table 5.

Table 5: CPU cycles per complex input sample for each stage of the channel estimation algorithm.

For a 10-ms frame, and assigning two cores per MIMO channel on our reference system (Dual Intel Core i7 Processor 2112 MHz (8 MB Cache/CPU, 1.5 GB DDR3 800MHz/CPU. 64-bit CentOS 5.0, Intel C++ Compiler 10.0.0.64, 80 GB HD Samsung 5400 rpm), each thread computes a total of 20 estimations per frame, resulting in 47.2 microseconds processing time per frame, and a total CPU usage of 0.48 percent.

Overall Turbo Encoder and Channel Estimation Performance

Table 6 summarizes the performance results of the Intel architecture implementation for both algorithms. The first column states the computational complexity of the algorithm in terms of millions of (floating-point) operations per frame. The second shows the actual time taken by our reference system to process the data (using the 8 cores available). The final column is the total CPU usage for processing the 57 Mbps data stream.

Table 6: Summary of performance results for selected baseband processing.

While the actual partitioning of the system will depend on the amount of baseband processing offloaded or/and throughput required, the results show that it is possible to move several portions of the baseband processing into an Intel architecture-based platform.

Conclusions

Modern Intel general-purpose processors incorporate a number of features of real value to DSP algorithm developers, including high clock speeds, large on-chip memory caches and multi-issue SIMD vector processing units. Their multiple cores are often an advantage in highly parallelizable DSP workloads, and software engineers can write applications at whatever level of abstraction makes sense: they can use higher-level languages and take advantage of compiler's automatic vectorization features. They can further optimize performance by linking in Intel IPP and MKL functions. In addition, if certain areas require it, SSE intrinsics are available, or the rich and growing set of specialized SSE and other assembly instructions can be used directly. The wireless infrastructure study we have summarized indicates that current Intel Architecture Processors may now be suitable for a surprising amount of intensive DSP work.


For more information see the Intel Technology Journal (March 2009) "Advances in Embedded Systems Technology" (intel.com/technology/itj.)


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video