Channels ▼
RSS

Design

DSP Meets Wireless Communications


Case Study: Wireless Baseband Signal Processing

In wireless communication systems, the physical layer (PHY) (baseband signal processing) is usually implemented in dedicated hardware (ASICs), or in a combination of DSPs and FPGAs, because of its extremely high computational load. GPPs (such as Intel architecture) have traditionally been reserved for higher, less demanding layers of the associated protocols.

This section, however, will show that two of the most demanding next-generation baseband processing algorithms can be effectively implemented on modern Intel processors -- LTE turbo encoder and channel estimation. The following discussion assumes Intel architecture as the target platform for implementation, and the parameters in Table 1.

Table 1: Parameters for LTE algorithm discussion.

LTE Turbo Encoder

The Turbo encoder is an algorithm that operates intensively at bit level. This is one of the reasons why it is usually offloaded to dedicated circuitry. As will be shown further, there are software architecture alternatives that can lead to an efficient realization on an Intel architecture platform.

The -- LTE standard specifies the Turbo encoding scheme as depicted in Figure 1.

Figure 1: Block diagram of the LTE Turbo encoder (Source: 3GPP)

The scheme implements a Parallel Concatenated Convolutional Code (PCCC) using two 8-state constituent encoders in parallel, and comprises an internal interleaver. For each input bit, 3 bits are generated at the output.

Internal Interleaver

The relationship between the input i and output π(i) bit indexes (positions in the stream) is defined by the following expression:

where K is the input block size in number of bits (188 possible values ranging from 40 to 6144). The constants f1 and f2 are predetermined by the standard, and depend solely on K.

At a cost of a slightly larger memory footprint (710 kilobytes), it is possible to pre-generate the π (i) LUTs for each allowed value of K. For processing a single data frame, only the portion of the table referring to the current K value will be used (maximum 12 KB).

Computing the permutation indexes at runtime would require 4 multiplications, 1 division and, 1 addition, giving a total of 6 integer operations per bit.

Convolutional Encoders

Each convolutional encoder implements a finite state machine (FSM) that cannot be completely vectorized or parallelized due to its recursive nature.

In terms of complexity, the implementation of this state machine requires 4 XOR operations per bit per encoder. If all the possible state transitions are expanded and stored in a LUT, the number of operations is 8 per byte per encoder.

Total Computational Requirements

For an input rate of 57 Mbit/s, the Turbo encoder requires 57.34 MOP (Million Integer Operations) for processing a single 10-ms frame.

Internal Interleaver Implementation

To allow parallelization and vectorization, the algorithm was changed by replacing the mod operation with a comparison test and a subtraction. Also, the inter-sample dependence was reassessed for allowing 8-way vectorized implementation as follows:

For parallel implementation, each thread receives a portion of the input data stream within the 6144-bit maximum range. The results (in CPU cycles) per input byte are given in Table 8 for the reference system described below. These results are included in the overall Turbo encoder performance measurements presented ahead. As can be seen from the results in Table 2, performance scales in an almost linear manner with the number of threads.

Table 2: CPU cycle counts per byte on multithreaded implementation of internal interleaver.

Convolutional Encoder Implementation

The implementation for this block comprises two steps:

  1. Generate all possible FSM state transitions and output values for each input value, regardless of the current state of the FSM. Generation of the output values is done in parallel.
  2. From the results generated in the previous step, select the one that corresponds to the actual state. All other generated results are discarded. The two convolutional encoders operate in parallel during this step.

A LUT is used that stores the pre-computed transition matrix (for each possible value of the input and FSM state). The size of this LUT depends on the number of bits sent to the encoders in each iteration.

Table 3 shows the number of cycles it takes to encode the input stream, as well as the memory footprint, per iteration. It can be seen that on Intel architecture, the large cache size allows a more flexible tradeoff between performance and memory usage. In this case, a 128-KB table is used.

Table 3: CPU cycle counts for the LUT-based encoder


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video