### Intel IPP and DMIP Libraries

As you can imagine, manually performing deferred mode optimization is a time consuming and tedious task, as well as a very hardware dependent one since cache sizes vary according to the CPU model. However, software libraries like the Intel's Integrated Performance Primitives (IPP) 6.0 offer transparent handling of this optimization.

IPP is a software library of highly optimized, multicore-ready functions for multimedia data processing and communication applications. IPP contains many application domains, one of them being image processing and computer vision. The list of supported functions is extensive, and covers virtually all low-level primitives (from convolution and filtering, through morphological operators, and up to logical 2D operations) as well as quite a few of high-level vision algorithms (3D reconstruction, is one example).

DMIP, short for "Deferred Mode Image Processing", is a framework built on top of IPP 6.0. As it's name suggests, DMIP offers the means for achieving deferred mode optimization for image processing algorithms. It is a relatively simple to use object-oriented framework, it is extensible, and it is layered on top of IPP library, meaning DMIP functions use IPP functions internally and enjoy all of IPP's low-level optimizations (such as SSE, MMX, and the like). The library encapsulates the deferred mode optimization, automatically performing the task of slicing the image according to the runtime CPU cache size, applying the algorithm on each slice and finally combining the result.

### Benchmarking DMIP vs. IPP

To investigate the benefits of DMIP compared to non deferred mode, I chose to compare DMIP-implemented algorithm to IPP-implemented one. As a test case, I chose to implement a typical convolution based edge detection algorithm: first compute image derivative in two orthogonal direction by convolving the image with a Sobel kernel, then combine these two derivatives into a single amplitude gradient, and finally threshold the amplitude gradient to achieve binary image indicating pixels with strong edges.

The following equation is representing this edge detector's logic, where **I** is the input image, thresh is the gradient threshold and **B** is the resulting binary image representing strong edges:

Figure 1 illustrates the pipeline.

Listing One is the "standard" implementation of this edge detector, using IPP.

unsigned char* EdgeDetectIpp::DetectEdges(unsigned char* pImage, int imageHeight, int imageWidth, int*ansStep) { // Allocate Float image and convert 8bit to 32bit floating point int srcStep; IppiSize roiSize = {imageWidth, imageHeight}; Ipp32f* pSrc = ippiMalloc_32f_C1(imageWidth, imageHeight, &srcStep); ippiConvert_8u32f_C1R(pImage, imageWidth, pSrc, srcStep, roiSize); // Prepare replicated image (required in order to allow convolution // result with same size as original image) int replicatedStep; Ipp32f* pReplicated = ippiMalloc_32f_C1(roiSize.width + 2, roiSize.height + 2, &replicatedStep); Ipp32f* pReplicatedRoi = (Ipp32f*)((int)pReplicated + (int)replicatedStep + sizeof(Ipp32f)); IppiSize replicatedSize = {imageWidth + 2, imageHeight + 2}; ippiCopyReplicateBorder_32f_C1R(pSrc, srcStep, roiSize, pReplicated, replicatedStep, replicatedSize, 1, 1); // Compute DX = I (conv) Sobel_X int dstStep; Ipp32f* pDx = ippiMalloc_32f_C1(imageWidth, imageHeight, &dstStep); ippiFilterSobelVert_32f_C1R(pReplicatedRoi, replicatedStep, pDx, dstStep, roiSize); // Compute DX*DX - in place ippiSqr_32f_C1IR(pDx, dstStep, roiSize); // Compute DY = I (conv) Sobel_Y Ipp32f* pDy = ippiMalloc_32f_C1(imageWidth, imageHeight, &dstStep); ippiFilterSobelHoriz_32f_C1R(pReplicatedRoi, replicatedStep, pDy, dstStep, roiSize); // Compute DY*DY - in place ippiSqr_32f_C1IR(pDy, dstStep, roiSize); // Sum DX^2 + DY^2 - in place ippiAdd_32f_C1IR(pDx, dstStep, pDy, dstStep, roiSize); // Sqrt - in place ippiSqrt_32f_C1IR(pDy, dstStep, roiSize); // Cast to 8 bit unsigned char* pOutput = ippiMalloc_8u_C1(imageWidth, imageHeight, ansStep); ippiConvert_32f8u_C1R(pDy, dstStep, pOutput, *ansStep, roiSize, ippRndFinancial); // Threshold - in place ippiThreshold_LTVal_8u_C1IR(pOutput, *ansStep, roiSize, this->threshold, 0); // Clean up the mess ippiFree(pDx); ippiFree(pDy); ippiFree(pSrc); ippiFree(pReplicated); return pOutput; }