Channels ▼
RSS

C/C++

Software to Hardware Parallelization


In seismic processing software, wave migration algorithms often incorporate FFT as part of the solution. This algorithmic technique (the FFT approach) usually takes the following sequence: Apply the FFT, phase shift the data, and then apply the inverse FFT to obtain the next location of the traveling wave front. When this is computed on a processor via software, data is taken out of memory to compute the FFT, then the result is put back into memory. The resulting data is taken out of memory again to do the phase shift and the new result is put back into the memory. Lastly, the new result data is taken out so the inverse FFT can be computed and the final result is put back into memory.

In such a linear execution, there are too many memory accesses that can slow down the execution. Using hardware to accelerate the algorithm, begin by taking the data out of memory and feeding it to a pipeline that executes the FFT and feeds the result into the phase shifter. The phase shifter then directly feeds its result into the inverse FFT. The output of the inverse FFT is then put back into memory.

A typical application-level profile for a wave migration algorithm might look like:

  • FFT 35%
  • Inverse FFT 35%
  • Phase Shift 25%
  • I/O 3%
  • Misc 2%

Traditionally, these algorithms are executed using floating-point math operations because the input data is gathered from 24-bit sensors and the output is often displayed as 24-bit color pixels. By implementing multiple floating-point units in an FPGA, the FPGA can outperform a CPU 5:1 on floating-point code, but can deliver a 10-100 time improvement on integer-based code.

Next, see if the code fits into the hardware. Take a look around the Web to find a variety of floating-point and integer FFT algorithms. Floating-point versions are larger and slower, but deliver more precise results. One example, a hybrid approach, can compute a 2Kpoint FFT at a rate of 400 Msamples/s (www.andraka.com/V4_FP_fft.htm). This is 4.5 times faster than a 2.2-GHz dual-core Opteron (www.fftw.org/speed/opteron-2.2GHz-64bit). Three such FFT engines can fit in a Xilinx SX55. A CORDIC algorithm can be used to do the phase shift, because implementing such an algorithm does not consume many of the FPGA's logic elements—less than 1000 look-up tables. Moreover, the computations can be pipelined to improve throughput.

While you may be impressed with speeds 4.5 times faster, you might still think that you can buy a 4-way Opteron-based system for the same price as an FPGA-based coprocessor with a dual-core Opteron to achieve the same performance gains. But because the functional units will be pipelined in an FPGA-based coprocessor, the phase shift and the inverse FFT come with the cost of some latency until the pipeline is filled—a few hundred nanoseconds. As a result, you achieve a speedup for the phase shift and the inverse FFT for free. The overall throughput improvement can then be estimated as follows:

4.5 + 4.5 + [(25/35)*4.5] = 12.2x (the overall speedup for 
the pieces we put in hardware)

The calculation [(25/35)*4.5] is an estimate of the performance of the phase shift and is based on the phase shift using similar math to the FFT, and the 25/35 is the proportion of the time taken on the CPU for the FFT.

Using Amdahl's Law, P equals the percent that can be parallelized and S is the speedup of that portion. Using this formula you find that:

1/(1-P)+(P/S) = 1/[(1-0.95) + (0.95/12.2)] = 7.8x application speedup


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video