Channels ▼
RSS

C/C++

A Simple and Efficient FFT Implementation in C++:
Part II


Comparative Benchmarks and Conclusions

Definition of the GFFT object factory means instantiation of all needed GFFT template classes during compilation. Each of them includes O(P) recursive template classes. A reasonable question: how long will the compilation take? GNU C++ 4.x was used to compile performance tests shown in Figures 1 and 2, and took about 10 seconds with optimization keys for a full range of values P in the object factory, because the overall number of different templates to be instantiated stays small. Microsoft Visual C++ 2003 and Intel C++ 8.x and later versions showed similar results, although not all C++ compilers can treat the recursive templates of the GFFT so efficiently. I suppose some older compilers try to instantiate template classes for each member function call even if the template classes are the same. There exist O(2^P) such member function calls in the GFFT. As a result, the compilation hangs and finally fails, after entire memory has been exhausted. Those unlucky examples are IBM C++ 6 (Visual Age) and GNU C++ 3.x, but not GNU C++ 2.96! They could compile the GFFT object factory without optimization options quite fast, but hung with optimization turned on.

A nice property of the GFFT that distinguishes it from other well known FFT libraries is that a runtime computation of roots of unity using sine and cosine functions is not needed any more. Usual FFT implementations of known libraries like FFTW or Intel MKL proceed in two steps:

    "Planning" including calculation of roots of unity Computing of FFT.

The second step can be repeated for different data of the same length. The GFFT omits the first step without any additional penalty. Figure 2 shows benchmarks of the GFFT compared to FFTW with options FFTW_ESTIMATE and FFTW_MEASURE as well as routine zfft1dc from Intel MKL 7.0, where total CPU time of both steps was measured. Being very good for small P, the performance of FFTW become poor for large P. The only hardware optimized Intel MKL has a similar high performance comparing to the GFFT. Qualitatively the same benchmark results were obtained on Itanium2 processor with Intel C++ compiler.

Finally, I would like to make a general conclusion as to the differences between the GFFT and traditional implementation. What made the GFFT so efficient? The essential step was the assumption of P to be static constant. The reason was not simply to play with template metaprogramming, but to give additional information to the compiler, so that it could optimize the code better. Really, many numerical algorithms have some integer parameters, which vary in a small range (like P in FFT). It can be assumed static and should be used as far as possible in template metaprogramming, but not to overload the compiler taking care about the total number of template classes to be instantiated. The final implementation can be more efficient, since the compiler has got more information and so an opportunity to optimize the code. You can use such an implementation originally with static parameter or compile it for all possible or needed parameter values within object factory and use as a library. The latter means you can choose highly optimized pieces of code without dynamic bindings at runtime.

This article described a C++ implementation technology on example of simple classical Cooley-Tukey FFT algorithm. Of course, there are many advanced and more complicated FFT algorithms, such as Winograd FFT, parallelized FFT, and the like. Those modern FFT approaches can also take advantages of template recursion and metaprogramming to improve performance significantly.

[Click image to view at full size]

Figure 2a: Comparative benchmark, part 1.

[Click image to view at full size]

Figure 2b: Comparative benchmark, part 2.

References

1. A. Alexandrescu. Modern C++ Design. Addison-Wesley, 2001.
2. J.W. Cooley, J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Comp. Vol. 19, 1965, pp. 297-301.
3. R. Crandall, C. Pomerance. Prime Numbers: A computational perspective. Springer, 2005.
4. H.J. Nussbaumer. Fast Fourier transform and convolution algorithms. Springer, Berlin, 1981.
5. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P.Flannery. Numerical Recipes in C++. Cambridge university press, 2002.
6. T. Veldhuizen. Fast Fourier Transform (FFT) implemented using Template Metaprograms. http://www.oonumerics.org/blitz/examples/fft.html
7. T. Veldhuizen. Using C++ template metaprograms, C++ Report, Vol. 7 No. 4 (May 1995), pp. 36-43. http://osl.iu.edu/~tveldhui/papers/Template-Metaprograms/meta-art.html


Vlodymyr teaches at Brandenburg University of Technology, Cottbus, Germany. He can be reached at myrnyy@math.tu-cottbus.de.

Read Part I of this article here


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

C/C++ Recent Articles

Video

Upcoming Events



Most Recent Premium Content