Policy-Driven Design & the Intel IPP Library

Shehrzad turns to generic programming techniques when using Intel's Integrated Performance Primitives Library to build a C++-based signal-processing application.


January 01, 2004
URL:http://www.drdobbs.com/policy-driven-design-the-intel-ipp-libr/184401744

The Intel Integrated Performance Primitives Library (IPP) is an API that provides functionality for developing signal, image, speech, graphics, and audio processing applications, as well as vector manipulation and matrix math. Available for both Windows and Linux (http://developer.intel.com/software/products/ipp/ipp30/), the library is divided into three domains — signal processing, image processing, and matrix operations. IPP is a C library, with defined data-processing functions that operate on arrays (one-dimensional in the signal-processing domain, two-dimensional in the image- processing domain) of a multitude of primitive types (short, float, and so on). Recently, I worked on a C++ project where I needed to implement a suite of core signal-processing algorithms that had to support both float and double types. In this article, I describe how I used generic-programming techniques to wrap the IPP signal-processing functions to make my code base generic — and at no cost to runtime performance.

Problem Domain

The system I implemented read data from an analog-to-digital (A/D) PCI card, then processed this data using standard signal-processing algorithms such as the Fast Fourier Transform (FFT). This A/D hardware returns 8-bit samples, which the software converts to floating-point voltage values after the data has been DMAed into PC memory. Most of the time it is advantageous to store these samples in a single-precision floating-point format (the C/C++ float type), both for memory footprint and performance reasons. However, in this application there was a large, existing suite of rather complicated algorithms, whose classes took pointers to double-precision floating-point (the C/C++ double type) arrays. Depending on the scenario, the application software would sometimes perform its processing functionality purely in the IPP domain, whereas in other cases the software leveraged this existing code base to perform its duties. In this latter case, you don't want to waste precious time and resources converting to/from single-precision and double-precision floating-point arrays. Moreover, it is desirable to use the single-precision data type whenever possible.

Example 1 lists a few function signatures from the IPP header files. The selected functions range from simple — the memory allocation routines in 1(a) — to fairly complicated — the real-valued FFT routines in 1(c), but the general structure of the library should be clear. Intel uses a well-defined coding standard (naming scheme) to differentiate between classes of similar functions that operate on different data types. For example, the IPP library employs the strings 32f and 64f to denote single-precision and double-precision versions of the same function. The Ipp32f and Ipp64f types are aliased to the C/C++ float and double native types, respectively. In fact, there are many more supported types, including variants of integer and complex types. In this article, I will focus on the single- and double-precision floating-point variants of the IPP library functions.

Generic Programming

My goal was to have a wrapper class that could handle both floating-point types. However, the conundrum I faced was that the underlying library was a C-based API using this naming scheme to differentiate amongst the different types. Such a situation cries out for a template-based solution — one that uses a policy and traits-driven approach providing the advantage of static binding, flexibility, and overall elegance.

Perhaps the most common approach in software engineering is layered indirection. Andrei Alexandrescu discusses how you can use policy classes to encapsulate different behavioral aspects of a problem domain, with one of the huge benefits being that there is no runtime cost with this technique — we rely on the compiler to deal with the type-dependent behavior by instantiating the appropriate template classes, functions, and methods at compile time [1].

Listing 1 is a simple template class, intended to represent a signal captured by an A/D device (or any one-dimensional signal, for that matter). At this point, it looks like the beginnings of an std::vector<> clone. However, consider the implementation of the constructor. To take full advantage of the IPP Library, Intel recommends using its allocation routines; see Example 1(a), which ensures that the data array allocated is properly aligned so that performance is maximized. But how can the template class in Listing 1 handle the fact that for the Ipp32f (float) type it should call ippsMalloc_32f(), while for the Ipp64f (double) type it should call ippsMalloc_64f()?

The solution lies in implementing a template policy class, such as that shown in Listing 2. In this policy-based class design, I have specialized two template classes such that they call into the appropriate IPP routines. If you were interested in merely allocating the array's memory for optimal performance, and subsequently calling into Intel's API directly, then you could theoretically define a custom STL allocator [2] and specialize the vector class using this allocator. (Of course, if that were all that was needed, I would not be writing this article.) Listing 3 is the complete Signal<> header file for the sample program that accompanies this article (available at http://www.cuj.com/code/). This template class handles memory allocation and deallocation, has a public method to return the magnitude of the FFT of the signal data, and also exposes methods to return the minimum/maximum values in the signal, along with where in the signal these points lie. Again, you could use the STL min_element() and max_element() functions to find the minimum and maximum values, but then you would forgo the performance increases the IPP library has to offer. In Alexandrescu's terms, the Signal<> template class is the policy host class, because it aggregates an instance of the policy template IntelIppPolicy<>, which is really nothing more than a set of static functions (policies) and typedefs (traits). This policy-based design shines when using some of the more sophisticated and extensive IPP functions, such as the FFT routines in the fftMagnitude() method.

The main function (see Listing 4) instantiates a Signal<> object and uses this object to read in a text file containing two seconds worth of musical notes. Using the Intel IPP functions via Signal<>'s public interface, the program computes the magnitude of the FFT of this waveform, and saves this data to another file, which can be plotted in Excel. Figure 1(a) shows the first quarter-second of the input waveform, and in Figure 1(b) the interesting portion of the FFT magnitude of the entire musical sequence is shown. The FFT clearly exhibits peaks at 440, 493, 587, and 659 Hz. The American Standard Pitch for the notes A, B, D, and E are 440.00 Hz, 493.88 Hz, 587.33 Hz, and 659.26 Hz, respectively [3]. Therefore, the input waveform consists of some combination (in this case, sequential) of those musical notes. Changing the DataType to either float or double results in a different Signal<> specialization being instantiated, with the corresponding IPP functions invoked.

This technique is not without its problems, however. To simplify the host class implementation, it is necessary to facilitate parallel interfaces amongst the different data types, and hence the IPP calls must look the same. Consider the seemingly innocuous IPP summation function, where the Ipp32f and Ipp64f variants are almost, but not quite, the same. You can see in Example 2 that the single-precision version of the IPP summation function takes in an additional "hint" argument, which steers the library either towards accuracy or speed, whereas the double-precision version does not need such an input argument. A reasonable solution to this problem is to not give the host class the opportunity to pass in the hint (since the Ipp64f version doesn't accept that parameter), and have the Ipp32f template specialization always use a hard-coded hint parameter. Another possible remedy is to give the host class the opportunity to pass in the hint argument, which is ignored in the Ipp64f case (and also any other type where the hint argument is not specified, if such policies were implemented). Other, more elegant, solutions surely exist, but if the typed API functions you are wrapping more or less mirror each other, then the proposed strategies are sufficient.

Conclusion

The IPP library supports 12 data types, and it is not entirely out of the realm of possibility that clients of this library may need to interchange various data types, depending on the problem at hand. In this article, I have described a policy-driven design that gives you the means to parameterize these disparate (but obviously closely related) types, despite the fact that the underlying C API is not template friendly, at least at first glance. Again, the IPP library also includes image processing and matrix operations, both of which follow the same approach in that multiple data types are supported through this function-naming technique. The design presented here could be easily extended to incorporate these other portions of the library. In certain machine vision or medical imaging applications, it is common practice to acquire digital (typically between 8 and 16 bits-per-pixel) images from a frame grabber or video-streaming device, then to process these images in floating point to reduce the effects of rounding error. When the data is presented to users (or saved to persistent storage), this image data needs to be converted back into integer-valued pixels, so that graphics APIs such as OpenGL or Microsoft's GDI/ GDI+ can deal with them. The design concepts presented here are tailor-made for this situation. Furthermore, policy-driven design comes at no cost to runtime performance, which in the data processing realm is almost always of the utmost importance. Alexandrescu was not kidding when he stated, "The power of policies comes from their ability to mix and match."

To build and run my sample program, you must download the trial version of IPP. Visual Studio .NET 2003 support is included with the distribution, and you must point the solution to the appropriate include and linker directories. For most machines, assuming that you select the defaults when running Intel's installer, that is C:\Program Files\intel\IPP\include and C:\Program Files\intel\IPP\lib\.

References

[1] Alexandrescu, Andrei. Modern C++ Design: Generic Programming and Design Patterns Applied, Addison-Wesley, 2001.

[2] Musser, David and Atul Saini. STL Tutorial and Reference Guide, Addison-Wesley, 2002.

[3] Search for "What are the freq. of the piano keys?" in the rec.music.makers.piano usenet newsgroup.


Shehrzad Qureshi is an engineer at Labcyte Inc., where he works primarily on DSP software and algorithms. He can be contacted at [email protected].


January 04:

Example 1: Sample single- and double-precision IPP API calls.


(a) 
Ipp32f *ippsMalloc_32f(int len);
Ipp64f *ippsMalloc_64f(int len);

(b) 
IppStatus ippsMaxIndx_32f(const Ipp32f *pSrc, int len, Ipp32f *pMax, int *pIndx);
IppStatus ippsMaxIndx_64f(const Ipp64f *pSrc, int len, Ipp64f *pMax, int *pIndx);

(c) 
IppStatus ippsFFTFwd_RToCCS_32f(const Ipp32f *pSrc, 
                                   Ipp32f *pDst, 
                                   const IppsFFTSpec_R_32f *pFFTSpec,
                                   Ipp8u *pBuffer);
IppStatus ippsFFTFwd_RToCCS_64f(const Ipp64f *pSrc, 
                                   Ipp64f *pDst, 
                                   const IppsFFTSpec_R_64f *pFFTSpec,
                                   Ipp8u *pBuffer);

January 04:

Example 2: Single- and double-precision variants of the IPP vector summation API.


(a) 
IppStatus ippsSum_32f(const Ipp32f *pSrc, 
                         int len, 
                         Ipp32f* pSum, 
                         IppHintAlgorithm hint);
(b) 
IppStatus ippsSum_64f(const Ipp64f *pSrc, int len, Ipp64f* pSum);

January 04:

Figure 1: (a) First quarter-second of the input waveform; (b) portion of the FFT magnitude of the entire musical sequence.

January 04:

Listing 1: Simple template class.

template <typename T>
class Signal {
public:
    Signal(int nSamples); // allocate underlying storage
    ~Signal();
    operator T *() { return m_pSamples; }
private:
    T *m_pSamples;
};




January 04: 

Listing 2: Template policy class.

// just a placeholder
template <typename T>
struct IntelIppPolicy {
    typedef T elem_type;
};
// specialization for float type
template<>
struct IntelIppPolicy<Ipp32f> {
    // 'traits'
    typedef Ipp32f elem_type;
    static Ipp32f *malloc(int len) { return ippsMalloc_32f(len); }
    static void free(void *ptr) { return ippsFree(ptr); }
    static IppStatus maxIndx(const Ipp32f *pSrc, int len, Ipp32f *pMax, int *pIndx) { 
        return ippsMaxIndx_32f(pSrc, len, pMax, pIndx); 
    }
   static IppStatus minIndx(const Ipp32f *pSrc, int len, Ipp32f *pMin, int *pIndx) { 
        return ippsMinIndx_32f(pSrc, len, pMin, pIndx); 
    }
};
// specialization for double type
template<>
struct IntelIppPolicy<Ipp64f> {
    // 'traits'
    typedef Ipp64f elem_type;
    static Ipp64f *malloc(int len) { return ippsMalloc_64f(len); }
    static void free(void *ptr) { return ippsFree(ptr); }
    static IppStatus maxIndx(const Ipp64f *pSrc, int len, Ipp64f *pMax, int *pIndx) { 
        return ippsMaxIndx_64f(pSrc, len, pMax, pIndx); 
    }
    static IppStatus minIndx(const Ipp64f *pSrc, int len, Ipp64f *pMin, int *pIndx) { 
        return ippsMinIndx_64f(pSrc, len, pMin, pIndx); 
    }
};




January 04: 

Listing 3: Signal<> header file.


#include "IPPPolicy.h"  // include the policy class that wraps Intel's IPP library
template <typename T>
class Signal {
public:
    typedef IntelIppPolicy<T> IPP;
    Signal(int N=0) : m_nSamples(N), m_pSamples(NULL) 
    { 
        if (N)
            m_pSamples = IPP::malloc(N); 
    }
    ~Signal() 
    { 
        if (m_pSamples)
            IPP::free(m_pSamples); 
    }
    operator T *() {
        return m_pSamples;
    }
    int getNumSamples() {
        return m_nSamples;
    }
    // adjust the length of the array
    void resize(int N)
    {
        if (m_pSamples)
            IPP::free(m_pSamples);
        m_nSamples = N;
        m_pSamples = IPP::malloc(N);
    }
    // Returns minimum value pIndx is where this point is located.
    T min(int *pIndx) 
    {
        T minVal;
        IppStatus sts = IPP::minIndx(m_pSamples, m_nSamples, &minVal, pIndx);
       if (ippStsOk != sts)
            throw std::runtime_error((char*)ippGetStatusString(sts));
        return minVal;
    }
    // Returns minimum value pIndx is where this point is located.
    T max(int *pIndx) 
    {
        T maxVal;
        IppStatus sts = IPP::maxIndx(m_pSamples, m_nSamples, &maxVal, pIndx);
        if (ippStsOk != sts)
            throw std::runtime_error((char*)ippGetStatusString(sts));
        return maxVal;
    }
    // Computes and returns the magnitude of the FFT of the signal. The FFT 
    // of a real-valued signal is a symmetric complex signal: hence the 
    // length of the return signal will be half (plus 1) of the input signal.
    // There isn't enough error checking performed in this method, for 
    // example one should really verify that the length of the input vector
    // is a power-of-2 length.
    // Finally, IPP library does provide a higher-level API function that
    // computes power spectrum of a signal--the equivalent to this method.
    void fftMagnitude(Signal<T> *pMagFFT)
    {
        IPP::cmplx_type *pFFT = NULL;
        // order of the FFT defined to be log2(length of input)
        int orderFFT = (int)(std::log((double)m_nSamples) / std::log(2.0));
        // this is somewhat inefficient, in a real implementation we'd likely
        // initialize this just once and then cache it away.
        IPP::fft_spec_type *pFFTSpec = NULL;
        IppStatus sts = IPP::allocFFTSpec(&pFFTSpec, orderFFT, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone);
        if (ippStsOk != sts)
            throw std::runtime_error((char*)ippGetStatusString(sts));
        // length of the return signal
        int lenFFT = (1<<(orderFFT-1)) + 1;
        if (NULL == (pFFT = IPP::cmplxMalloc(lenFFT)))
            throw std::bad_alloc("Failed to malloc complex array!");
        // ready to compute the FFT now
        if (ippStsOk != (sts = IPP::fwdFFT(m_pSamples, (IPP::elem_type *)pFFT, pFFTSpec)))
            throw std::runtime_error((char*)ippGetStatusString(sts));

        // magnitude of the FFT is the complex modulus
        pMagFFT->resize(lenFFT);
        if (ippStsOk != (sts = IPP::cmplxModulus(pFFT, pMagFFT->m_pSamples, lenFFT)))
            throw std::runtime_error((char*)ippGetStatusString(sts));
        // clean up
        IPP::free(pFFT);
        IPP::freeFFTSpec(pFFTSpec);
    }
private:
    // the underlying array
    T *m_pSamples;
    // how many data points in the m_pSamples array
    int m_nSamples;
};

January 04:

Listing 4: Instantiating a Signal<> object.

int _tmain(int argc, _TCHAR* argv[])
{
    typedef double DataType; // this can be float or double
    typedef Signal<DataType> SigType;
    // instantiate signal object and read in the data from the CSV file ...
    SigType signal(NUM_SAMPLES);
    ifstream ifstr(DATA_FILE);
    if (!ifstr)
    {
        cerr << "Failed to open " << DATA_FILE << "\n";
        return -1;
    }
    DataType *pData = signal;
    for (int ii=0; ii<NUM_SAMPLES; ++ii)
    {
        if (!(ifstr >> pData[ii]))
        {
     cerr << "Expected " << NUM_SAMPLES << " data points in the file " << DATA_FILE << "\n";
            return -1;
        }
    }
    // get min and max values in this signal ...
    int iiMin, iiMax;
    DataType minVal = signal.min(&iiMin),
             maxVal = signal.max(&iiMax);
    cout << "Min value is " << minVal << "\nMax val is " << maxVal << "\n";
    // compute magnitude of the FFT ...
    SigType magFFT;
    signal.fftMagnitude(&magFFT);
    // save it to a file ...
    pData = magFFT;
    ofstream ofstr(FFT_FILE);
    for (int ii=0; ii<magFFT.getNumSamples(); ++ii)
        ofstr << pData[ii] << "\n";
    ofstr.close();
    return 0;
}


        

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.