Design

A Memory-Constrained Image-Processing Architecture

By Mayur Patel, July 01, 1997

Image-processing applications are hungry for both processing time and memory. The architecture Mayur presents here puts you in the driver's seat when it comes to controlling memory consumption for high-performance image processing.

Dr. Dobb's Journal July 1997: A Memory-Constrained Image-Processing Architecture

Mayur is a member of the software development staff at Cinesite Digital Film Center, a full-service post-production and visual effects facility. He can be contacted at [email protected].

Sidebar: A Macro Technique for Managing Types

Image-processing applications are hungry for both processing time and memory. CPU demand typically does not pose a problem, since most operating systems' schedulers can adequately balance the needs of competing applications. However, virtual memory is the only service provided by the operating system to deal with the memory load. Allowing an image-processing application to swap memory pages to disk impairs all the activity on the machine.

A good image-processing library should deal with this memory problem with minimal supervision from you. In this article, I present an image-processing architecture that gives you a reasonable degree of control over memory consumption.

Tile On Demand

Tile-based processing consists of dividing an image into small, rectangular pieces called "tiles," processing each tile, and reassembling the image. The principle of locality applies, so an image-processing operation usually requires input tiles of approximately the same area as the output tile. Clearly, processing each tile requires less memory than processing the entire image.

Many existing image-processing libraries support scan-line-based rather than tile-based image processing. Scan-line libraries are easy to implement and often perform efficiently because most image file formats store data in scan-line chunks. However, a tile-based approach retains the same advantages (scan-lines are simply tiles with a height of one pixel) and is more general.

One case where you benefit directly from flexible tile size is with transform-based image compression. Many popular Discrete Cosine Transform-based compression algorithms require 8×8 tiles. With scan-line-based libraries, you must construct 8×8 units by writing eight scan lines into a buffer and moving data back and forth from the buffer accordingly. This is unnecessary busy work; it is better to provide tiles in whatever proportions you request.

Another convenient feature is lazy evaluation. Processing on demand reduces the expense of image processing and promotes interactivity. For example, users may reorder the sequence of operations several times before explicitly requesting an output image. Users may also require only a small portion of the output image or may extract data somewhere in the middle of the pipeline. Such requests can be serviced in less time than it takes to produce the entire output. Demand-driven processing helps to minimize wasted computation in these situations.

In this framework, data requests propagate from the output toward the input (see Figure 1). The input satisfies those requests and triggers the appropriate computation, and the result flows back toward the output. Along the way, operations are applied to the data according to the dependency information already supplied through the application. By the time the data reaches the output, all the specified operations have been performed.

Coupling demand-driven execution with tile-based processing yields an environment with great potential. Certainly, these features carry with them nontrivial overhead. However, they also work together to reduce the total amount of memory required to process images. The extra cooking time is often worth the wait.

Object Interaction

This tile-based image-processing architecture is inherently object oriented. The interaction between three fundamental classes -- Image, ImageTile, and ImageMan -- is the foundation of the system.

Tiles constitute the interface between the library and the application. You need not be aware of the internal representation of the image as long as the tile implementation is sufficiently general to handle the widest range of cases.

Applications request pixels in two ways. The more convenient interface is Image:: newTile(), which allocates a new tile and fills it with image data. The other interface is Image::fillTile(), which accepts an existing tile for processing and overwrites its contents. The API is as simple as that.

Behind the scenes, the image interacts with the image manager; see Figure 2. Image::newTile() requests (which go to the image manager) let the image manager provide a host of services, including caching. The manager eventually comes back to the image and uses its Image::fillTile() method to compute data.

In the obvious implementation of this interaction pattern, there is one inefficiency. When the system receives a newTile() request, it immediately allocates memory. Image::fillTile() then pulls input data into scope for local processing. The fillTile() method most likely uses the newTile() interface to request data, so memory is allocated at every step as requests propagate backward. These allocations aren't actually needed until the data begins to flow forward.

A more constrained approach allocates memory only after the input data is available, but I prefer the former behavior for two reasons:

It supports a flexible interface between the image and image manager.
It supports a simple developer interface for the implementation of real image processing operators.

Implementation

The tiles in my architecture use the (X,Y,Z,C) image-space model. The X component represents the width, Y represents the height, C represents the color channel, and Z the frame number. This model lets you describe images of any number of channels. It also lets you describe a series of images as a sequence, which is useful for processing digital video or film since an animation is expected to consist of several frames, each with a uniform width, height, and channel count.

The coordinate (x,y,z,c) is sufficient to specify the smallest datum. In addition to these coordinates, my tile structure (see Figure 3) contains width and height elements that make it possible to represent a one-channel area from an image. You can access pixel data using methods provided by the tile class.

Different applications will require pixels of different precision. For example, 8 bits per color channel commonly represents image data. For higher quality work, however, 16 bits per channel is more useful. In some cases, particularly in scientific computing, 32-bit floating-point values (0.0 to 1.0) best represent each plane. This implementation supports all three options. A method allows typecasting the data between supported types.

Granted, using macros in C++ is generally bad programming style. Likewise, cute macro tricks in C also represent poor programming style. However, I have used both to encapsulate data-type support. Any technique, with disciplined execution, has its place. In this situation, a new data type may be incorporated with the addition of only one function. No changes to existing code are necessary. You can judge the case for yourself. This approach is discussed in more detail in the accompanying text box entitled, "A Macro Technique for Managing Types."

The tile class inherits the properties of a referenced object. Smart pointers may be used to organize tiles at the application level. The method Tile::deleteTile() is included to complement the Image::newTile() method. Image::newTile() provides construction services; Tile::deleteTile() provides dereferencing and destruction services.

My image manager implementation is generic, providing the minimum set of services required to meet its obligations. At most, only one image manager may exist at any time. An auxiliary class regulates access. The Singleton design pattern applies well to the image manager, making sure only one instance of a class exists at any time. Blending the referenced-object pattern with the Singleton pattern also ensures that the image manager is deallocated when it is not required. This is useful if image processing represents only a small portion of the total application. Attaching an image-manager reference to images through the base class ensures that the image manager is instantiated during the lifetime of any of its potential clients.

I provide two base classes for images: InputImage and SampledImage. InputImage best suits images (described as mathematical functions) that may have variable resolution or unlimited range. SampledImage best suits images of finite resolution and range.

SampledImage uses the Strategy design pattern to deal with out-of-range behavior. The Strategy design pattern encapsulates implementations of an algorithm so that the algorithm isn't hardwired into a class. A SampledImage may receive a request during a convolution calculation for data beyond its range of valid samples. Fill-strategy objects may be attached to SampledImage to determine how this area is validated before the tile is returned to the client.

I don't use a base class for the output image. Since the output can flow to the screen or printer as easily as to a file, the output mechanism will often depend upon application-specific code. Additionally, the demand-driven characteristic of the architecture makes it easy not to place constraints on the output.

Example Applications

Listing One s a test program that enhances the contrast of an SGI RGB file. The program first specifies data dependencies using class constructors. The first object opens an SGI RGB-format file. The second object executes the contrast-enhancement operation. The last object writes the data to an output file, possibly converting the data type. Once these dependencies are defined, instantiating the output image triggers processing of this simple chain.

SGI RGB files are both powerful and flexible, supporting 8- or 16-bit integer channels and any number of color channels. As you can see from the Contraster class (see Listings Two and Three), implementing new operators is straightforward. Contraster consists of a constructor, a destructor, and a fillTile() implementation. Developers of new image-processing operators are well shielded from the image manager and other system complexities.

The Contraster object executes the contrast-enhancement operation using floating-point math. I did this for two reasons. First, it demonstrates how easy it is to typecast the data in tiles. (But be warned: Typecasting is as expensive as any pixel-processing operation.) Second, using floating-point math makes it easy to deal with over- and under-exposure problems. Integer values may overflow or underflow, whereas floating-point values retain data integrity. There are ways around integer saturation, but I show floating-point in action for the sake of example.

Extending the Library

Since my image-manager implementation doesn't take full advantage of the flexibility of the architecture, it is a good place to start extending my implementation. One of the first features that should be incorporated into the image manager is a tile-caching mechanism. Imagine a chain of convolution operations executed using the current implementation. As the requests for tiles propagate toward the input, the size of the requested area becomes larger. Over time, these requests will result in excessively redundant calculations.

A cache minimizes redundancy by storing data that is perceived to be frequently used. Although the cache requires more RAM -- one of the constraints you're trying to overcome -- you can control how much RAM it uses. Giving some RAM back to the processing engine for caching is a guard against extremes in the space-time tradeoff. Remember, caching operator data saves more than the computation at that node in the pipeline. It saves computations at every node up to and including that operation.

How you implement the caching mechanism depends on the execution characteristics that you want the library to have. You can cache individual tiles, color channels, or entire images. You may need predictive caching or reactive caching. Matching your needs with a cache implementation is well worth the development time.

Earlier, I described why I have two image base classes. The InputImage encapsulates infinite-range or infinite-resolution data. The SampledImage encapsulates traditional image representations. If the caching technique you prefer depends on the image's resolution or range, you will not be able to cache InputImage images. If so, then you may add InputImage support by making InputImage's newTile() method defer to the image manager. This gives you more flexibility in experimenting with new caches.

Once you have implemented a cache, it would be nice to offer forced caching. Some image operators, such as a Fast Fourier Transform (FFT), are not localized. To execute an FFT, it is easiest to process the entire image at once. And it isn't cheap. Forced caching lets you identify and address hot spots for the caching system, such as these nonlocalized operators. Without forced caching, or if there isn't enough room to cache the entire FFT image, each tile request arriving at the FFT image will likely result in recalculation of the entire image.

Another characteristic of the current image manager is that tile sizes are not regulated. Currently, the application is on its honor not to request more data than it actually needs at one time. Poorly designed applications may request the entire image in one call. The current image manager will propagate this request. This defeats the purpose of having an architecture designed to constrain memory demand.

Request decomposition allows the image manager to decompose requests for large tiles into several requests for smaller tiles. Processing each smaller tile requires less memory than the whole. Another benefit of request decomposition is that the requests for the many small tiles may be executed in parallel. Giving the image manager the power to dispatch concurrent requests empowers the entire system with concurrent execution behavior. The developer of image-processing operators doesn't need to worry about writing parallel code.

Conclusion

The source code accompanying this article (see "Availability," page 3) implements the framework of this architecture. It is written in C++ and should be fairly easy to port. I encourage you to develop the architecture into an image-processing library custom fitted to your specific environment. For small systems, it should allow processing pipelines of considerable complexity. For multiprocessor systems, it should allow multiple renders to coexist without malicious contention for resources.

The Silicon Graphics Imagevision Library provides a superset of all the features I describe here, and I recommend that IRIX users evaluate it. However, it is not available for other platforms.

Acknowledgment

Thanks to Bob Amen and Cinesite Digital Film Center in Hollywood, California, for making equipment available for testing.

Listing One

#include <SGIReader.h>#include <Contraster.h>



</p>
#include <SGIWriter.h>


</p>
int main ( int argc, char *argv[] )
{
   SGIReader in( "test.rgb" );     
   Contraster contrasted( &in, 1.3 );   
   SGIWriter 
      out( "out.rgb", (InputImage *)&contrasted, in.getArea(), UChar);
   return ( 0 );
}

Back to Article

Listing Two

// File: Contraster.h -- (c) 1996 Mayur Patel

</p>
#ifndef Contraster_CLASS
#define Contraster_CLASS


</p>
#include <InputImage.h>
#include <Type.h>


</p>
class Contraster : public InputImage {
   public:
      Contraster( InputImage *pIn, float rLumFactor, float rGreyPoint = 0.5 );
      ~Contraster( void );
      int
      fillTile( ImageTile *pWriteHere );
   protected:
      InputImage    *_pHost;
      float     _rLumFactor;
      float     _rGrey;
};
#endif

Back to Article

Listing Three

// File: Contraster.C -- (c) 1996 Mayur Patel

</p>
#include <Contraster.h>
#include <assert.h>


</p>
Contraster::Contraster( InputImage *pIn, float rLumFactor, float rGrey )
:  InputImage( Real )
{
   _pHost = pIn;
   if ( _pHost )
      _pHost->registerReference();
   _rLumFactor = rLumFactor;
   _rGrey = rGrey;
}
Contraster::~Contraster( void )
{
   if ( _pHost )
      _pHost->unregisterReference();
   return;
}
int
Contraster::fillTile( ImageTile *pWriteHere )
{
   int          iRet = 0;
   ImageTile        *pTile;
   unsigned long    lLoop;   
   register float   *pSrc;
   register float   *pDest;
   if ( _pHost && pWriteHere )
   {
      pTile = _pHost->newTile( pWriteHere->getArea() );
      if ( pTile )
      {
         assert( getType() == pWriteHere->getType() );   
         pTile->typecast( Real );
         pSrc = (float *) pTile->getBuffer();
         pDest = (float *) pWriteHere->getBuffer();
         lLoop = 
            pWriteHere->getArea().width *
            pWriteHere->getArea().height;
         while ( lLoop )
         {
            lLoop--;
            // contrast enhance:
            *pDest = _rGrey + (( *pSrc - _rGrey ) * _rLumFactor );
            // clamp over-exposure & under-exposure:
            *pDest = ( *pDest < 0.0 ) ? ( 0.0 ) : ( *pDest );
            *pDest = ( *pDest > 1.0 ) ? ( 1.0 ) : ( *pDest );
            pDest++;
            pSrc++;
         }
         pTile->deleteTile();
      }      
   }
   return ( iRet );
}

Back to Article

Listing Four

   #ifndef TYPESEPER   #define TYPESEPER ;
   #endif


</p>
   #ifndef TYPEENDOFLIST
   #define TYPEENDOFLIST ;
   #endif


</p>
   TYPEMACRO( float, Float )
   TYPESEPER
   TYPEMACRO( int, Int )
   TYPEENDOFLIST


</p>
   #ifdef TYPESEPER
   #undef TYPESEPER
   #endif


</p>
   #ifdef TYPEENDOFLIST
   #undef TYPEENDOFLIST
   #endif


</p>
   #ifdef TYPEMACRO
   #undef TYPEMACRO
   #endif

Back to Sidebar

Listing Five

enum Type {

</p>
#define TYPESEPER ,
#define TYPEENDOFLIST
#define TYPEMACRO( type, label ) label
#include <Type.m>
};

Back to Sidebar

Listing Six

void addition( Type opType, void *pDest, void *pOp1, void *pOp2 )
{
   switch( opType )
   {
   #define TYPEMACRO( type, label ) \
         case( label ) : \ 
            *(( type * ) pDest ) = \
               ( type ) *pOp1 + ( type ) *pOp2; \
            break;
   #include <Type.m>
 default:
         break;
   };
   return;
}

Back to Sidebar

DDJ

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design

A Memory-Constrained Image-Processing Architecture

Tile On Demand

Object Interaction

Implementation

Example Applications

Extending the Library

Conclusion

Acknowledgment

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Listing Six

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Design

A Memory-Constrained Image-Processing Architecture

Tile On Demand

Object Interaction

Implementation

Example Applications

Extending the Library

Conclusion

Acknowledgment

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content