April 06, 2007
New Algorithms for Image Searching
In an effort to add search capabilities to actual images, rather than to text captions that describe them, engineers at the University of California, San Diego have developed the Supervised Multiclass Labeling (SML) system which calculates the probability that various objects it has been trained to recognize are present, then labels the images accordingly. After labeling, images can be retrieved via keyword searches. The SML system also splits up images based on content; for example, the system can separate a landscape photo into mountain, sky,and lake regions.
The term "supervised" refers to the fact that the users train the image labeling system to identify classes of objects, such as "tigers," "mountains" and "blossoms," by exposing the system to different pictures of tigers, mountains and blossoms. The supervised approach lets the system differentiate between similar visual concepts -- such as polar bears and grizzly bears. In contrast, "unsupervised" approaches to the same technical challenges do not permit such fine-grained distinctions. "Multiclass" means that the training process can be repeated for many visual concepts. The same system can be trained to identify lions, tigers, trees, cars, rivers, mountains, sky or any concrete object. This is in contrast to systems that can answer just one question at a time, such as "Is there a horse in this picture?" "Labeling" refers to the process of linking specific features within images directly to words that describe these features.
SML starts with the training process, which involves showing the system many different pictures of the same visual concept or "class," such as a mountain. When training the system to recognize mountains, the location of the mountains within the photos does not need to be specified. This makes it relatively easy to collect training examples. After enough different pictures that include mountains, the system can identify images in which there is a high probability that mountains are present.
During training, the system splits images into 8x8 pixel squares and extracts some information from them. The information extracted from each of these squares is called a "localized feature." Localized features for an image are collectively known as a "bag of features."
Next, the researchers pool together bag of features for a particular visual concept. This pooled information summarizes the important information about each of the individual mountains. Pooling yields a density estimate that retains the critical details of all the different mountains without having to keep track of every 8x8 pixel square from each of the mountain training images.
Once trained, the system annotates pictures it has never encountered. The visual concepts that are most likely to be in a photo are labeled as such. In, say, a tiger photo, the SML system processes the image and concludes that "cat, tiger, plants, leaf and grass" were the most likely items in the photograph.
"At annotation time, all the trained classes directly compete for the image. The image is labeled with the classes that are most likely to actually be in the image," said said Nuno Vasconcelos a professor of electrical engineering at the UCSD Jacobs School of Engineering, and senior project researcher. Others involved in the project include Gustavo Carneiro, a UCSD postdoctoral researcher now at Siemens Corporate Research, UCSD doctoral candidate Antoni Chan, and Google researcher Pedro Moreno.
The SML system can also split up a single image into its different regions, a process known as "image segmentation." When the system annotates an image, it assigns the most likely label to each group of pixels or localized feature, segmenting the image into its most likely parts as a regular part of the annotation process. "Automated segmentation is one of the really hard problems in computer vision, but we’re starting to get some interesting results," said Vasconcelos.
Posted by Jon Erickson at 12:55 PM Permalink
|