Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Image Segmentation for Image Recognition


Dr. Dobb's Journal July 1998: Algorithm Alley

Lee is a software engineer at CompuCyte Corporation. He can be contacted at [email protected].


Good ideas reappear in surprising places. When Lee first told me about his algorithm, I thought he was discussing an old flood-fill algorithm -- one that I remembered seeing long before it appeared in Kent Porter's "Graphics Programming" column (DDJ, June 1989). There are good reasons for the similarity: Both algorithms rely on a fundamental insight that applies to many data-analysis tasks -- where possible, work with groups of data, not with single elements.

Rather than studying a single pixel at a time, Lee's technique immediately bundles pixels into larger structures. As in many graphics problems, the natural unit is a horizontal line of pixels. By quickly moving the problem from one involving pixels to one involving lines, Lee reduces his work by an order of magnitude.

-- Tim Kientzle

Subtle variations of color and texture in photographic images make it difficult to clearly identify a foreground and a background. It's considerably simpler with "artificial" images, such as scanned text. The relatively high contrast makes it easier to separate pixels, but you still have to somehow identify regions of pixels and extract relevant information. In this article, I'll describe an algorithm that quickly reduces high-contrast images to a data set that lends itself to image recognition and analysis. This algorithm could be used to analyze black text on a white background, calibration marks on a surface, red Legos in a toybox, or a slide containing blots of a drug being tested by a pharmaceutical company.

The initial stage of image analysis usually consists of two steps -- thresholding and segmentation. Thresholding algorithms classify pixels as either foreground or background, or classify pixels according to their foreground class (green, red, and blue classes, for instance, or skin texture, hair texture, and clothing texture). Thresholding algorithms can extend past analyzing the raw brightness or color of a pixel. Some involve linear transforms such as convolution filters and FFTs. Others rely on nonlinear transforms such as image dilation.

Segmentation algorithms group similar pixels together into coherent units. Segmentation is often regarded as a backwater of image analysis. However, a good segmentation algorithm can yield easily analyzable, abstract objects with a minimum of processing. The algorithm I describe here has good performance. It visits each pixel exactly once, processing the pixels in their raster order, which maintains cache coherency, improves bus bandwidth and eliminates disk thrashing. It yields data structures representing connected groups of pixels; the data structures can be manipulated to extract information without revisiting each pixel.

The algorithm groups pixels into run-length encoded lines. It then groups the lines into structures I call "blobs." The blobs contain groups of lines that overlap. This yields an object that describes a set of four-connected (north, south, east, and west) pixels, none of which are connected to any pixel outside of the set. You can quickly find a blob's extent, position, center, or area. You can analyze the blob's topology and shape. You can revisit the pixels within the blob to determine their brightness, coloration, or texture. These techniques lend themselves to the tasks of optical-character recognition (OCR), scene recognition (where the scene consists of artificial, high-contrast objects), and automated chemical analysis.

The Segmentation Algorithm

Suppose you're trying to analyze black text against a white background. In this context, the segmentation algorithm looks at one raster at a time and breaks it down into a series of black lines. Each line is compared to the lines on the previous raster (the "old line list") and is either added to an existing blob or a new blob is created.

If a blob has lines on the old line list, those lines are open lines. I maintain a count of open lines for each blob. When the count falls to zero, that blob is finished and need no longer be considered. The bulk of the algorithm, then, consists of walking down the old line list and comparing new lines to see how to attach them to blobs.

The algorithm starts with the genesis of a new line. The application typically maintains a threshold and scans until it finds a pixel whose value is over the threshold. It then starts a line and scans until it hits a pixel under threshold. The application then builds a line consisting of the X start, X end, and the current Y. The line also contains a pointer to its blob (null signals that the line is unattached) and a pointer to the next line in the blob.

There are three possible relationships between the current old line and new line, as illustrated in Figure 1.

  • The old line might end before the new line begins; see Figure 1(a).
  • The new line might end before the old line begins; see Figure 1(b).
  • The two lines might overlap; see Figure 1(c).

If the old line is wholly before the new line, it is finished. I decrement its blob's open line count and, if the open line count has reached zero, add the blob to the list of finished blobs. I then advance to the next old line in the list. If the new line is wholly before the old line, then we have finished attaching the new line. If no previous old line overlaps this new line, then the new line starts a new blob.

The third case involves overlapping old and new lines. Its implementation requires two parts: attaching and ending. You attach the new line to an old line in one of three ways. If the new line is not yet linked to a blob, we link it to the old line's blob. If the new line is already linked to the old line's blob, then you increment the blob's loop count (this is a signal that the blob split into two halves and these two halves have been rejoined by the new line, like at the bottom of the letter "O"). If the new line has already been linked to a blob other than the old line's blob, then the new line joins two blobs (like at the bottom of the letter "V"). You merge the two blob's line lists and discard one of the blobs.

Merging blobs is an important feature of any segmentation algorithm. Figure 2 illustrates a particularly difficult example, in which three new lines combine four separate blobs. You need to compare the ends of the new and old lines after attaching the new line. If the old line ends first, you can move onto the next old line. If the new line ends first, then you have to increment the blob's open line count (lines can now be attached to both the old and the new line). You then move on to create the next new line.

The algorithm has one additional minor complexity: At the end of a raster, you either run out of new lines or old lines. If you run out of new lines, you run through a loop that ends the remaining old lines. You decrement the open line counts and finish any blobs whose open line count falls to zero. If you run out of old lines first, you must create new blobs for each new line until the end of the raster.

Additional Features

It's possible to augment this algorithm to collect additional information about each blob. I've already included the loop counts, since those are so easy to extract. The demonstration program makes use of the loop count to distinguish the loops in bullseyes: The bullseye rings must be closed to be recognized. You might also use the loop count to help in distinguishing letters; uppercase "B" and lowercase "g" are the only letters with two loops (assuming no malformations).

Another group of useful features are the image tops and bottoms. The tops are those lines that were not attached to any line above. The total number of tops is one more than the number of merges required to produce the blob. The bottoms are those lines that had no other line attached below them. The algorithm can find these by setting a flag when first considering an old line. The algorithm clears the flag if a new line attaches to the old line. You check the flag when finishing the old line; if it is still set, you have a bottom.

Numbers of tops and bottoms by themselves can be used to distinguish shapes. For instance, the letter "w" has three tops and two bottoms for both serif and sans-serif fonts, the small "m" has three tops and three bottoms. Positions of tops and bottoms are perhaps more telling; a top directly above a bottom may indicate a vertical line (in character recognition, this is typically the leftmost stroke of the letter). It's quite possible to build a neural network character recognizer that uses a number of loops, top position, and bottom position as its input features; this recognition engine is efficient and provides a confidence output along with its judgment.

Positional information can be recorded during segmentation. The algorithm can update the enclosing rectangle for the blob as it adds lines (the top of the blob is always one of its starting lines, the bottom is always the old line at which the line count drops to zero). The algorithm can also compute the blob centroid (if each pixel weighed the same, the centroid would be the blob's center of mass) and area. It may be more efficient, however, to compute these during a quick trip through the blob's line list after it has been finished.

Optimizations

Ironically, the two most expensive parts of the segmentation algorithm are outside of the algorithm proper. Thresholding is the most expensive, because it requires examining each individual pixel. Using table lookups or other techniques that examine several pixels at a time can help to speed this part. A quick optimization may be to check for all ones or all zeros. Typically, though, thresholding is sped up through hardware assist: The data is thresholded, run-length encoded, and transmitted in encoded form.

The second most expensive operation is allocation. Artifacts of image capture can often cause a jagged edge at the top of an object, The algorithm may create a blob initially for each of these jagged lines. Blob creation can be an expensive operation, even if the blobs are unlinked from a private heap. It's possible to modify the algorithm so that blobs are created only when a previously unattached new line is attached to a previously unattached old line. This delays blob creation past typical effects caused by aliasing. A single new line joining many unattached old lines will only create a single blob.

My implementation maintains an array of pointers to lines. This indirection can be eliminated by allocating all lines from a single array. You can then set an array index to the first line on the old line list and fetch subsequent old lines by incrementing the index. It's possible in many operating systems to reserve a large block of memory without actually allocating the backing store necessary to maintain the virtual memory image on disk (this is the VirtualAlloc function in Windows 95/NT). The algorithm can reserve the theoretical maximum size memory required for the worst case (which is half the number of pixels in the image times the size of a line). The algorithm can perform the actual allocation at the beginning of the raster, enlarging the array of lines to prepare for the worst case. This eliminates the need to check for "out of memory" when allocating lines during the course of a scan.

Finally, many processors have a sweet spot at 16 bytes when dealing with arrays of structures. (Typically, the bus is 32-bits wide and the bus bursts are four words long. A 16-byte structure makes optimum use of one burst.) The line structure consists of an X start, X end, Y coordinate, pointer to the next line in the blob, and pointer to the line's blob. That's 20 bytes if the algorithm uses 32-bit integers and pointers. Packing the X start and X end into two 16-bit words may provide a significant boost in speed.

Applications

The structures yielded by the segmentation algorithm lend themselves to a variety of uses. The lines provide compression for a number of tasks. First among these are tasks involving the relations of the blobs to each other and to the background. It's an easy matter to relate the blobs to their neighbors once they have been segmented; you can form agglomerations (such as lines of print or areas of half-tone dot images) that provide useful feature information. My most recent use of the algorithm has been a bullseye recognizer that detects a microscopic bullseye deposited at an exact spot on a microscope slide by photolithography. The segmentation algorithm quickly identifies the two rings and target of the bullseye by searching for the only three objects that have centroids that are close together (the middle of each ring corresponds with the middle of the bullseye).

It's possible to perform template matches using the algorithm data structures. (I've included a template matcher in Listing OneA template matcher can match blob against blob. This is done by relying on the line order within the blobs to efficiently find which parts of the image overlap the template. The resulting score gives a confidence measure that the template is the same shape as the image. Template matching can be used in OCR to discriminate between letters that have similar overall shapes (such as "A" and "R," which both have one loop, one top, and two bottoms).

The segmentation algorithm can be used to find and outline objects deposited on a substrate. For instance, a drug manufacturer might test thousands of variants of a drug at once using a robotic deposition tool that places the results of different reactions at different spots on a dish. The spots can be found using the segmentation algorithm and the chemical reaction can then be analyzed by measuring the intensity and color of the reactions in the spot. The algorithm can be used in other aspects of robotics; a few marks on an object can be used to track the object in real time and position it. Again, these algorithms work best in an artificial environment in which lighting and contrast are controlled to maximize the difference between foreground and background.

Conclusion

This segmentation algorithm rapidly identifies the boundaries of contiguous objects from their background. It makes efficient use of modern CPU resources and yields objects with data structures that have utility. I believe that it could serve as the core of a number of interesting image processing applications. I would like to see it used in novel ways, such as in identifying -- in real time -- high contrast objects placed in video scenes as markers for animated characters. It might also be used as the base segmentation algorithm for real-world imaging using sophisticated image processing techniques that enhance foreground. The algorithm can be extended to three dimensions by matching against two old line lists: one in the X-Y plane and one in the X-Z plane. This adaptation might be used in computer-aided tomagrophy to identify tumors. In any case, the algorithm is dependable and useful and should serve you well in appropriate applications.

DDJ

Listing One

void CMonochromeBitmap::BuildBlobList(){
  CLineVectorIterator clviNew,clviEnd;
  AllocateLineBlock(clviNew,clviEnd);
  m_pBlob.reserve(eBlobsPerBlock);


</p>
  // Create two vectors of pointers to lines to hold the new line list
  // and the old line list. We initialize them to hold Width()/2 members.
  // This is the maximum possible number of lines.
  CLinePointerVector clpvRaster[2];
  clpvRaster[0].reserve(Width()/2);
  clpvRaster[1].reserve(Width()/2);
  int nOldLineIndex = 0;
  int nNewLineIndex = 1;
  // At the start, the old line list has no members.
  CLinePointerVectorIterator clpviOldEnd = clpvRaster[nOldLineIndex].begin();
  for (int nY = 0; nY < Height(); nY++) {
    CLinePointerVector &clpvOld = clpvRaster[nOldLineIndex];
    CLinePointerVector &clpvNew = clpvRaster[nNewLineIndex];
    nOldLineIndex = nOldLineIndex ^ 1;
    nNewLineIndex = nNewLineIndex ^ 1;
    CLinePointerVectorIterator clpviOld = clpvOld.begin();
    CLine *pLineOld;
    if (clpviOld == clpviOldEnd)
      pLineOld = NULL;
    else
      pLineOld = *clpviOld++;


</p>
    CLinePointerVectorIterator clpviNew = clpvNew.begin();
    // Get the raster to process and set up for the first byte
    const BYTE *pRaster = GetRaster(nY);
    BYTE bBit = 0x80;
    BYTE bByte = *pRaster++;
    //  Set up the X extents.
    int nX = 0;
    const int nXEnd = Width();


</p>
    while(nX < nXEnd) {
      CBlob *pBlobOld;
      // Search for the start of a line.
      while((bByte & bBit) == 0) {
        if (++nX == nXEnd) goto linedone;
        bBit = bBit >> 1;
        if (bBit == 0) {
          bBit = 0x80;
          bByte = *pRaster++;
        }
      }
      // Start a line.
      if (clviNew == clviEnd) {
        // need more memory for lines.
        AllocateLineBlock(clviNew,clviEnd);
      }
      CLine &lineNew = *clviNew++;
      lineNew.m_nXStart = nX;
      lineNew.m_pNextLineSameBlob = 0;
      lineNew.m_nY = nY;


</p>
      // Put it on the new list.
      *clpviNew++ = &lineNew;
      // Find the extent of the line.
      do {
        if (nX == nXEnd) break;
        bBit = bBit >> 1;
        if (bBit == 0) {
          bBit = 0x80;
          bByte = *pRaster++;
        }
        nX++;
      } while (bBit & bByte);
      lineNew.m_nXEnd = nX - 1;
      // Now finish all old lines wholly before our new line
      if (pLineOld) {
        while (pLineOld->m_nXEnd < lineNew.m_nXStart) {
          pBlobOld = pLineOld->m_pBlob;
          if (--(pBlobOld->m_nOpenLines) == 0) {
            AddBlob(pBlobOld);
          }
          if (clpviOld == clpviOldEnd) {
            pLineOld = NULL;
            break;
          } else {
            pLineOld = *clpviOld++;
          }
        }
      }
      // Do the first line that overlaps our new line
      if (pLineOld && pLineOld->m_nXStart <= lineNew.m_nXEnd) {
        pBlobOld = pLineOld->m_pBlob;
        lineNew.m_pBlob = pBlobOld;
        *(pBlobOld->m_ppLastLine) = &lineNew;
        pBlobOld->m_ppLastLine = &(lineNew.m_pNextLineSameBlob);
        // See if the new line ends the old line or vice-versa.
        if (pLineOld->m_nXEnd > lineNew.m_nXEnd) {
          // the old line extends past the new line.
          // the new line gives the old line's blob another open line.
          pBlobOld->m_nOpenLines++;
          continue;
        } else {
          // we close the old line and extend the blob at the
          // same time. We continue to close old lines.
          if (clpviOld == clpviOldEnd) {
            pLineOld = NULL;
          } else {
            pLineOld = *clpviOld++;
            while(pLineOld->m_nXEnd <= lineNew.m_nXEnd) {
              // End this line too. We have two cases:
              CBlob *pBlobOther;
              if ((pBlobOther = pLineOld->m_pBlob) == pBlobOld) {
                // the old line is part of the new line's blob.
                // This is just a loop.
                pBlobOld->m_nLoops++;
                pBlobOld->m_nOpenLines--;
              } else {
                // This is the merge case
                pBlobOld->Merge(pBlobOther);
                delete pBlobOther;
                pBlobOld->m_nOpenLines--;
              }
              if (clpviOld == clpviOldEnd) {
                pLineOld = NULL;
                break;
              } else {
                pLineOld = *clpviOld++;
              }
            }
            if (pLineOld && pLineOld->m_nXStart <= lineNew.m_nXEnd) {
              // This last old line overlaps and ends the new line.
              CBlob *pBlobOther = pLineOld->m_pBlob;
              if (pBlobOther == pBlobOld) {
                pBlobOld->m_nLoops++;
              } else {
                pBlobOld->Merge(pBlobOther);
                delete pBlobOther;
              }
            }
          }
        }
      } else {
        // This is the case where no line overlaps the new line.
        // Start a blob with this line.
        CBlob *pBlobNew = lineNew.m_pBlob = new CBlob;
        pBlobNew->m_nOpenLines = 1;
        pBlobNew->m_pFirstLine = &lineNew;
        pBlobNew->m_ppLastLine = &(lineNew.m_pNextLineSameBlob);
        pBlobNew->m_nLoops = 0;
      }
    }
linedone:
    if (pLineOld) {
      while (TRUE) {
        // Finish all old lines.
        CBlob *pBlobOld = pLineOld->m_pBlob;
        if (--(pBlobOld->m_nOpenLines) == 0) {
          AddBlob(pBlobOld);
        }
        if (clpviOld == clpviOldEnd) break;
        pLineOld = *clpviOld++;
      }
    }
    // Finally, record the end of the new list as the
    // end of the old list to be.
    clpviOldEnd = clpviNew;
  }
  // At the end, we've got one last raster of lines to finish.
  CLinePointerVector &clpvLast = clpvRaster[nOldLineIndex];
  for (CLinePointerVectorIterator clpvi = clpvLast.begin();
  clpvi != clpviOldEnd;) {
    // Finish all old lines.
    CBlob *pBlobOld = (*clpvi++)->m_pBlob;
    if (--(pBlobOld->m_nOpenLines) == 0) {
      AddBlob(pBlobOld);
    }
  }
  // and finally, segmentation is complete
}

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.