 Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

# A Simple and Efficient FFT Implementation in C++: Part II

In Part One of this article, I introduced a new efficient implementation of the Cooley-Tukey Fast Fourier Transform (FFT) algorithm, discussing recursion and the elimination of trigonometric function calls. In Part Two, I discuss the specialization of short FFTs, FFT selection at runtime, and present some comparative benchmarks and conclusions.

### Specialization of Short FFTs

The implemented template class recursion has P levels. Every FFT calculation process runs from level P to level 1, while the level 1 is empty (Listing Three, in Part One of this article). Some comprehensive books on FFT, for example , show that short length FFTs (P=1,2,3,4) could use fewer operations than the general algorithm. Such particular cases can be simply incorporated into the new implementation using partial specialization of the template class `DanielsonLanczos`. Listing Six represents those specializations for N=4 and N=2. Since every FFT goes down to the first specialized one, these additional specializations lead to a small overall performance improvement of about 1-5 percent.

Listing Six

```>
template<typename T>
class DanielsonLanczos<4,T> {
public:
void apply(T* data) {
T tr = data;
T ti = data;
data = data-tr;
data = data-ti;
data += tr;
data += ti;
tr = data;
ti = data;
data = data-ti;
data = tr-data;
data += tr;
data += ti;

tr = data;
ti = data;
data = data-tr;
data = data-ti;
data += tr;
data += ti;
tr = data;
ti = data;
data = data-tr;
data = data-ti;
data += tr;
data += ti;
}
};

template<typename T>
class DanielsonLanczos<2,T> {
public:
void apply(T* data) {
T tr = data;
T ti = data;
data = data-tr;
data = data-ti;
data += tr;
data += ti;
}
};
```

You might ask why, when programming in C++, am I still using a C-style array instead of `std::complex<T>` or even `std::vector<T>`? Because the C++ standard library is not suited to high-performance computing, at least its open source distribution that I have. A simple trace into the sources of the header complex makes clear, that a simple operation on complex numbers like x=a+b written in C++ generates a temporary object of type `std::complex<T>` resulting in poor performance demonstrated by the dotted line 'CGFFT' on Figure 1 (see Part One of this article). This is an example where the expression templates technique in the header {complex} could be very helpful, but was not applied. Application of `std::complex<T>` could result in some shorter code, but the computational performance, which I try to maximize, would be lost.

### More Insights To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

## Featured Reports ## Featured Whitepapers 