Channels ▼

Deferred Mode Optimization

Intel IPP and DMIP Libraries

As you can imagine, manually performing deferred mode optimization is a time consuming and tedious task, as well as a very hardware dependent one since cache sizes vary according to the CPU model. However, software libraries like the Intel's Integrated Performance Primitives (IPP) 6.0 offer transparent handling of this optimization.

IPP is a software library of highly optimized, multicore-ready functions for multimedia data processing and communication applications. IPP contains many application domains, one of them being image processing and computer vision. The list of supported functions is extensive, and covers virtually all low-level primitives (from convolution and filtering, through morphological operators, and up to logical 2D operations) as well as quite a few of high-level vision algorithms (3D reconstruction, is one example).

DMIP, short for "Deferred Mode Image Processing", is a framework built on top of IPP 6.0. As it's name suggests, DMIP offers the means for achieving deferred mode optimization for image processing algorithms. It is a relatively simple to use object-oriented framework, it is extensible, and it is layered on top of IPP library, meaning DMIP functions use IPP functions internally and enjoy all of IPP's low-level optimizations (such as SSE, MMX, and the like). The library encapsulates the deferred mode optimization, automatically performing the task of slicing the image according to the runtime CPU cache size, applying the algorithm on each slice and finally combining the result.

Benchmarking DMIP vs. IPP

To investigate the benefits of DMIP compared to non deferred mode, I chose to compare DMIP-implemented algorithm to IPP-implemented one. As a test case, I chose to implement a typical convolution based edge detection algorithm: first compute image derivative in two orthogonal direction by convolving the image with a Sobel kernel, then combine these two derivatives into a single amplitude gradient, and finally threshold the amplitude gradient to achieve binary image indicating pixels with strong edges.

The following equation is representing this edge detector's logic, where I is the input image, thresh is the gradient threshold and B is the resulting binary image representing strong edges:

Figure 1 illustrates the pipeline.

Figure 1: Edge detection example.

Listing One is the "standard" implementation of this edge detector, using IPP.

unsigned char* EdgeDetectIpp::DetectEdges(unsigned char* pImage, int imageHeight, 
int imageWidth, int*ansStep)

   // Allocate Float image and convert 8bit to 32bit floating point
   int srcStep;

   IppiSize roiSize = {imageWidth, imageHeight};
   Ipp32f* pSrc = ippiMalloc_32f_C1(imageWidth, imageHeight, &srcStep);
   ippiConvert_8u32f_C1R(pImage, imageWidth, pSrc, srcStep, roiSize);

   // Prepare replicated image (required in order to allow convolution 
   //   result with same size as original image)
   int replicatedStep;
   Ipp32f* pReplicated = ippiMalloc_32f_C1(roiSize.width + 2,
      roiSize.height + 2, &replicatedStep);
   Ipp32f* pReplicatedRoi = (Ipp32f*)((int)pReplicated +
     (int)replicatedStep + sizeof(Ipp32f));
   IppiSize replicatedSize = {imageWidth + 2, imageHeight + 2};
   ippiCopyReplicateBorder_32f_C1R(pSrc, srcStep, roiSize, pReplicated,
     replicatedStep, replicatedSize, 1, 1);

   // Compute DX = I (conv) Sobel_X
   int dstStep;
   Ipp32f* pDx = ippiMalloc_32f_C1(imageWidth, imageHeight, &dstStep);
   ippiFilterSobelVert_32f_C1R(pReplicatedRoi, replicatedStep, pDx,
      dstStep, roiSize);

   // Compute DX*DX - in place
   ippiSqr_32f_C1IR(pDx, dstStep, roiSize);

   // Compute DY = I (conv) Sobel_Y
   Ipp32f* pDy = ippiMalloc_32f_C1(imageWidth, imageHeight, &dstStep);
   ippiFilterSobelHoriz_32f_C1R(pReplicatedRoi, replicatedStep, pDy,
      dstStep, roiSize);
   // Compute DY*DY - in place
   ippiSqr_32f_C1IR(pDy, dstStep, roiSize);

   // Sum DX^2 + DY^2 - in place
   ippiAdd_32f_C1IR(pDx, dstStep, pDy, dstStep, roiSize);

   // Sqrt - in place
   ippiSqrt_32f_C1IR(pDy, dstStep, roiSize);

   // Cast to 8 bit
   unsigned char* pOutput = ippiMalloc_8u_C1(imageWidth, imageHeight,
   ippiConvert_32f8u_C1R(pDy, dstStep, pOutput, *ansStep, roiSize,

   // Threshold - in place
   ippiThreshold_LTVal_8u_C1IR(pOutput, *ansStep, roiSize, 
      this->threshold, 0);

   // Clean up the mess
   return pOutput;

Listing One: IPP implementation of edge detector.

As you can see in the code, we explicitly have to take care of memory allocation and deallocation, each operator invocation take a few lines of code for preparing all input arguments and calling the operator's function. We also need to consider for each operator whether in-place operation can be used in order to increase performance. For anyone with experience in algorithms implementation, this sounds pretty reasonable and takes roughly 30 C++ code lines. Now, have a look at the DMIP implementation in Listing Two.

unsigned char* EdgeDetectDmip::DetectEdges(unsigned char* pImage, int 
imageHeight, int imageWidth, int* outputStride)

   // Allocate output image
   Ipp8u* pOutput = ippiMalloc_8u_C1(imageWidth, imageHeight,

   // prepare the images, convolution kernels and graph
   IppiSize roi = {imageWidth, imageHeight};
   Image Input(pImage, ipp8u, ippC1, roi, imageWidth);
   Image Output(pOutput, ipp8u, ippC1, roi, *outputStride);
   Kernel KH(idmFilterSobelHoriz);
   Kernel KV(idmFilterSobelVert);
   Graph G = To32f(Input);

   // Perform the computation
     this->threshold, 0);
   return pOutput;

Listing Two: IPP implementation of edge detector.

As you can see, it takes less than 10 C++ code lines, and moreover the algorithm logic itself is very clear, intuitive and is formed in exactly a single line of code:

      this->threshold, 0);

Notice how the * operator is overloaded by DMIP to be the onvolution operator, and how the data type conversions from 8-bit integer to 32-bit floating-point and back to 8-bit integer is so elegantly and easily handled using DMIP functions.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.