Our image-matching approach combines both the GPS information and the image content. More explicitly, we restrict the database of candidate-matching images to those within a predefined search radius around the GPS value at the location of interest, i.e., the user's actual location or, in the case of pre-caching, the destination. Next, we use computer vision techniques to match the query image to the set of images that is near a certain GPS location. After the GPS filters the information, we downscale the database from 500,000 images to a much smaller candidate image set (10- ≈ 100 images) to increase the likelihood of the image-based matching algorithm being successful.
To recognize the top matching database images, our algorithm inspects each database image and compares it with the query image. More specifically, for each SURF keypoint in the query image, the algorithm finds the nearest and the second nearest neighbor keypoints in the database image, based on the city block (L1) distance between the keypoints' descriptors. Next, it computes the ratio of the nearest distance to the second nearest distance. If the ratio is smaller than a threshold, the algorithm decides that the nearest point is a "true" match of the query keypoint, since it is significantly closer to it, in L1 distance, than any other keypoint, as in . After the algorithm detects the matching pairs between the query image and each database image, it ranks these images in descending order according to the number of such pairs. In our system, we return the top five images to the user.
Finally, if the number of nearby candidates is reasonable, i.e., in the range of a few images (≈ 10), we perform brute-force image matching, based on the ratio of the distances between the nearest and second-nearest neighbor image features as explained previously. Otherwise, for scenarios with a high density of nearby images or with no GPS location available, the number of candidate images is large, and we use Fast Approximate Nearest Neighbor (FLANN) indexing as proposed in .
Building a More Realistic Database
In , we showed quantitative results on the standard ZuBuD test set , with 1005 database images and 115 query images, together with qualitative results for a small benchmark of images of iconic landmarks. Although the results on the ZuBuD set are informative, they do not precisely reflect the nature of photos taken by people under realistic conditions. To better simulate the performance of our MAR system, we selected 10 landmarks from around the world: the Arc de Triomphe (France), the Leaning Tower of Pisa (Italy), the Petronas towers (Malaysia), the Colosseum (Rome, Italy), the National Palace Museum (Taiwan), the Golden Gate Bridge (California, USA), the Chiang Kai Shek Memorial Hall (Taiwan), the Capitol (Washington D.C., USA), the Palais de Justice (Paris, France), and the Brandenburg Gate (Berlin, Germany). For each landmark, we tested the recognition performance of the MAR system. First, we reduced the 500,000-image database on the server to a smaller set that included only the images within a predefined radius. Next, we randomly picked one of the landmark images in the set as a query image, and we kept the others as database images. This provided one query image and a database for each landmark. In total, we had 10 separate databases, with a total of 777 images. The characteristics of these databases vary between sites. For instance, the size of the database is only 5 for the National Palace Museum, while it is 200 for the Palais de Justice. Also, the database corresponding to the Golden Gate Bridge has 12 relevant images of the bridge out of 19 images, while the one corresponding to the Palace of Justice has only one relevant image out of a total of 200 images. Also, images labeled by the same landmark might look different depending on the lighting conditions, the viewpoint, and the amount of clutter. To take advantage of these databases, we manually annotate them. Simulating MAR on the final databases, we obtain representative results for MAR's performance, since these databases are sampled from the Wikipedia data on our server, i.e., the actual database queried by MAR.
Matching Enhancement by Duplicates Removal
Web images have varying sizes and visual features density; therefore, many practical problems arise with realistic data that are not present in standard databases. Running MAR on the data just discussed in the "Building a More Realistic Database" section, we identified one problem in the matching algorithm that degrades MAR's performance: there are multiple keypoints in the query image matching one keypoint in the database image. Even when these matches are false, they are highly amplified by their multiplicity, which eventually affects the overall retrieval performance. We adjusted the algorithm to remove duplicate matches and to improve matching accuracy at no computational cost, as shown next.
We believe that the duplicate matches' problem particularly arises in cases where a strong imbalance exists between the number of keypoints in the database and query images, i.e., there are much fewer keypoints in the database image than in the query image. For each keypoint in the query image, the algorithm shown in our "Image Matching Algorithm" section detects the matching keypoint in the database image. Therefore, the imbalance forces many keypoints in the query image to match one single point in the database image. The standard ZuBuD data set does not have this issue, since its images have comparable numbers of keypoints. Contrarily, Wikipedia and web images in general have high variance, which emphasizes the necessity of having representative sample databases as mentioned earlier. To solve this problem, our adjusted algorithm prohibits duplicate matches. Whenever a keypoint in the database image has multiple matches in the query image, we pick only the keypoint with the closest descriptor. We recognize that duplicate matches may still be correct, e.g., repetitive structures. However, as we show in the "Experimental Results Extension" section of this article, substituting the "best" matching descriptor, i.e., the nearest, for the duplicates, improves the overall performance of our system.
Matching Enhancement by Histograms of Distances
Originally, we match the query image to each database image one at a time, by using the ratio of keypoints descriptors distances (see the "Image Matching Algorithm" section). However, we intuitively expect the distances between the descriptors of the query and a mismatching image to be larger than those between the query and a matching image. Hence, instead of merely relying on the distances ratio, we propose further exploring other statistics, based on descriptors' distances. More explicitly, we take the top ten matching database images obtained from the original algorithm shown in the "Image Matching Algorithm" section, and for each (query image, database image) pair, we build the histogram of descriptors distances. Our purpose is to extract more information from these distances about the similarity or dissimilarity between the query and the database images. For instance, by filtering out database images with large average distance, we remove obviously mismatching images. The cost of this approach is not high, since the distances are already computed, and hence we are only leveraging their availability. We applied this approach to ten query images and their corresponding GPS-constrained databases. Results are shown in the "Experimental" section of this article.
Our image-matching algorithm works this way: Let Q be the query image. We first apply our existing matching algorithm and we retrieve the top 10 database images D1, D2, …, D10. Next, we consider each pair (Q,Di) at a time and build the histogram Hi of minimum distances from keypoints in the query image Q to the database image Di. There is no additional overhead, since these distances were already calculated by the existing matching algorithm. For each histogram Hi, we obtain its empirical mean Mi and skewness Si:
The smaller the skewness, the closer Hi is to symmetric. Our main assumptions are these:
- If Mi is large, then many of the descriptors pairs between Q and Di are quite distant and hence are highly likely to be mismatches. Therefore, image Di must not be considered a match for image Q.
- If Si is small (close to zero), then the histogram Hi is almost symmetric. Having many descriptors in Q and Di that are "randomly" related, i.e., not necessarily matching, would result in this symmetry. We expect this scenario when the two images Q and Di don't match and hence there is no reason for the histogram to be biased.
Based on these assumptions, we remove database images that have very low skew Si. We also cluster the images based on the means M1, M2,..., M10 into two clusters (we used k-means). We remove the images that belong to the cluster with the higher mean. For the experimental results, please refer to the "Matching Enhancement through Histograms Distances" section of this article. Figure 3 shows a block diagram of our adjusted matching algorithm.
Matching Enhancement by Database Extension
As mentioned before, the images on our server are extracted from Wikipedia. Although Wikipedia is a good source for information in general, it is not the best source for images. For example, in the geographically restricted database of Wikipedia images in the surroundings of the Palais de Justice, only one out of 200 images in the database matches the building itself. However, not only the matching algorithm, but also the number of matching images in the database, affects the performance accuracy. Moreover, the query image might be taken from different views and under different light conditions. Having a small number of matching images, e.g., one image in the daylight for the Palais de Justice, would limit the performance of MAR. For this reason, it is preferred that the database has more images matching the query. We extend the database by combining GPS-location, geo-tagged Wikipedia pages, text in these pages, and the large resources of Web images. For each Wikipedia page, we don't only download the images on the page itself, but we also use the title as a text query on Google's image search engine. Next, we pick the first few retrieved images and associate them with the same Wikipedia page. The improvement in MAR's performance is shown in the results of the "Matching Enhancement through Database Extension" section in this article.