Shameem Akhter is a platform architect at Intel, focusing on single socket multi-core architecture and performance analysis. Jason Roberts is a senior software engineer at Intel. They are the authors of Multi-Core Programming.
Error diffusion in image and graphics processing programs represents an excellent example of a problem which at first may seem resistant to a multi-threaded approach, but upon further review will actually lend itself well to speed up possibilities. Using the Floyd-Steinberg algorithm for error diffusion, the trick lies in considering not the destination of calculated error, but the sources. In this way, the algorithm and code become fairly straightforward.
To see how you might apply the aforementioned methods to a practical computing problem, consider the error diffusion algorithm that is used in many computer graphics and image processing programs. Originally proposed by Floyd and Steinberg, error diffusion is a technique for displaying continuous-tone digital images on devices that have limited color (tone) range. Printing an 8-bit grayscale image to a black-and-white printer is problematic. The printer, being a bi-level device, cannot print the 8-bit image natively. It must simulate multiple shades of gray by using an approximation technique. An example of an image before and after the error diffusion process is Figure 1. The original image, Figure 1(a), is composed of 8-bit grayscale pixels. Figure 1(b) is the 2-bit result of the image that has been processed using the error diffusion algorithm. The output image is composed of pixels of only two colors: black and white. At the resolution of of this printing, they look similar. Figures 1(c) and 1(d) are the same images as Figures 1(a) and 1(b) but zoomed to 400 percent and cropped to25 percent to show pixel detail. Now you can clearly see the 2-bit black-white rendering and 8-bit grayscale.
The basic error diffusion algorithm does its work in a simple three-step process:
- Determine the output value given the input value of the current pixel. This step often uses quantization, or in the binary case, thresholding. For an 8-bit grayscale image that is displayed on a 1-bit output device, all input values in the range [0, 127] are to be displayed as a 0 and all input values between [128, 255] are to be displayed as a 1 on the output device.
- Once the output value is determined, the code computes the error between what should be displayed on the output device and what is actually displayed. As an example, assume that the current input pixel value is 168. Given that it is greater than our threshold value (128), we determine that the output value will be a 1. This value is stored in the output array. To compute the error, the program must normalize output first, so it is in the same scale as the input value. That is, for the purposes of computing the display error, the output pixel must be 0 if the output pixel is 0 or 255 if the output pixel is 1. In this case, the display error is the difference between the actual value that should have been displayed (168) and the output value (255), which is -87.
- Finally, the error value is distributed on a fractional basis to the neighboring pixels in the region, as in Figure 2.
This example uses the Floyd-Steinberg error weights to propagate errors to neighboring pixels. 7/16ths of the error is computed and added to the pixel to the right of the current pixel that is being processed. 5/16ths of the error is added to the pixel in the next row, directly below the current pixel. The remaining errors propagate in a similar fashion. While you can use other error weighting schemes, all error diffusion algorithms follow this general method.
The three-step process is applied to all pixels in the image. Listing 1 shows a simple C implementation of the error diffusion algorithm, using Floyd-Steinberg error weights.
Analysis of the Error Diffusion Algorithm
At first glance, you might think that the error diffusion algorithm is an inherently serial process. The conventional approach distributes errors to neighboring pixels as they are computed. As a result, the previous pixel's error must be known in order to compute the value of the next pixel. This interdependency implies that the code can only process one pixel at a time. It's not that difficult, however, to approach this problem in a way that is more suitable to a multi-threaded approach.