Experimental Results Extension
In this section, we look at how matching can be enhanced by removing duplicate matches, by analyzing the histograms of distances, and by adding images to the landmark's database.
Duplicates Removal. To test the impact of duplicate matches' removal, we ran both the original and the new matching algorithms (see "Image Matching Algorithm" and "Matching Enhancement by Duplicates Removal") on our sample databases for various SURF thresholds. In part (a) of Figure 4, we display the average of the top 1 precision (the precision of the top matched image), top 5 precision (the precision of the top 5 matched images), and the average precision over the 10 databases. Clearly, removing the duplicate matches improves the performance.
Matching Enhancement through Histograms Distances. In part (b) of Figure 4, we display the top 10 retrieved images for the Chiang Kai Shek Memorial Hall in Taiwan and the Capitol in Washington DC from the database mentioned in the section "Building a More Realistic Database" by the updated matching algorithm with duplicates removal. We also used the histograms of distances to further refine the match, as explained previously. Based on the histograms analysis, images labeled with a red square in the figure are rejected and those tagged with a blue square are retained. From these experiments, we see that portraits and statues of people can be clearly distinguished and rejected, since most of the queries are those of buildings. Such images in general have a very symmetric histogram. They also may have a large mean. We observed similar results for the other queries mentioned previously also.
Matching Enhancement through Database Extension. For each of our 10 landmarks, we download the top 21 images returned by Google image search when queried with the title of the landmark Wikipedia page, and we add them to the corresponding landmark database. Next, we apply our matching algorithm to the improved results in part (c) of Figure 4. Obviously, the top 1 and top 5 precision values increase for almost all the 10 databases mentioned in the section entitled "Building a More Realistic Database."
Implementation and Optimization
In this section, we discuss how to optimize the algorithms to meet reasonable response time on the MID and how to take advantage of the features of the Intel Atom processor.
Feature Extraction and Image Match. As we have described in the previous sections, image feature extraction and image match (if the precached data were available on the MID) are performed on the MID client device. To perform these tasks, the MID requires significant computing power and memory storage.
Software Optimization. Software optimization includes the following components.
- Image feature extraction. The original SURF-based image feature extraction code is based on the OPENCV implementation as described earlier in this article. We identified two hotspots after using the VTune analyzer: keypoint detection and keypoint descriptor generation. We applied multiple optimization techniques to these hotspots to speed up the image feature extraction. We multi-threaded keypoint detection and keypoint description generation by using OPENMP, and we achieved 1.6X speedup when compared with the single-thread version on an Atom processor. Converting keypoint detection from floating-point to integer arithmetic provided an additional 15% speedup. We also quantized a keypoint descriptor from float (32 bit) to char (8 bit) that resulted in a 4X reduction in the data storage requirements. Performance was improved by taking advantage of the integer operations without significantly degrading the quality of the results.
- Image match. We again used the VTune analyzer and identified distance calculations as the hotspot of image match. We multi-threaded keypoint detection and keypoint description generation by using OPENMP. We achieved a 1.7X speedup when compared with the single-threaded version on an Intel Atom processor. We also vectorized the distance calculation by using SSE intrinsics to take advantage of 4-way SIMD vector units in the Intel Atom processor, which provided a 2X speedup over the nonvectorized image match codes.
Performance on a Platform Based on the Atom Processor
We analyzed the software implementation on a single core, hyper-threaded Intel Atom system (800MHz, 256MB RAM 512KB L2 cache), running a Linux. Our performance analysis is conducted on four datasets with different sizes and resolutions: 10 QVGA images, 10 VGA images, 100 QVGA images, and 100 VGA images. We chose these datasets for two reasons: a) most of the current MID devices take QVGA or VGA video input; b) for a given GPS location, the visible landmarks normally range from 10 to 100. Hence, with the help of GPS localization, we need to pre?cache 10-100 landmarks for database comparison. Figure 5(a) shows the total runtime of datasets to compare performance by using pair-wise match versus FLANN indexing. Figure 5(b) lists the runtime breakdown for each component including keypoint detection, descriptor generation, and image match. For the VGA resolution, the number of keypoints per image is around 800; and for the QVGA resolution, the number of keypoints per image is around 350. From the figures, it is clear that pair-wise matching runtime increases linearly with the database size. With FLANN indexing, the runtime scales very well when the database size increases. When a query image needs to be compared against a database size larger than 10 (which is a common case), we should consider using FLANN indexing instead of pair-wise matching to get a faster response. Overall, the execution time is about one second for querying a VGA image from a 100 VGA image database.
Tracking. The tracking algorithm explained previously has been optimized by using:
- A simplified multi-resolution pyramid construction with simple 3-tap filters
- A reduced linear system with gradients from only 200 pixels in the image instead of from all the pixels in the images
- SSE instructions for the pyramid construction and the linear system solving
- And only the coarsest levels of the pyramid to estimate the alignment.
Performance was measured on the same Intel Atom system by using VGA (640x480) input video and different options. The results are shown in Figure 6, which displays the measured frames per second (fps) for different models (displacement/camera rotation), estimation method (robust/non-robust) and resolution levels and iterations per level used. For pure displacement models, using non-robust estimation and running five iterations in Levels 3 and 4 of the multi-resolution pyramid (Level 1 being the original resolution) the performance is over 80 fps.
In , we presented MAR with a fully functional prototype on a MID device with an Intel Atom processor inside. In this article, we described new improvements to the matching and tracking algorithms in addition to the design of the system and its database. We also presented the code optimization benchmark results for the Intel Atom processor. With all these improvements, MAR demonstrates the powerful capabilities of future mobile devices derived from location sensors, network connectivity, and computational power.
The authors would like to thank Igor Kozintsev, Oscar Nestares, and Horst Haussecker for their significant contributions to this work.
 S. Pradhan, C. Brignone, J.H. Cui, A. McReynolds, and M.T. Smith. "Websigns: hyperlinking physical locations to the web." Computer, Volume 34, pages 42-48, 2009.
 J. Lim, J. Chevallet, and S. N. Merah. "SnapToTell: Ubiquitous Information Access from Cameras." In Mobile and Ubiquitous Information Access (MUIA04) Workshop, 2004.
 Y. Zhou, X. Fan, X. Xie, Y. Gong, and W.Y. Ma. "Inquiring of the Sights from the Web via Camera Mobiles." In Multimedia and Expo, 2006 IEEE International Conference, pages 661-664, 2006.
 G. Takacs, V. Chandrasekhar, N. Gelfand, Y. Xiong, W.C. Chen, T. Bismpigiannis, R. Grzeszczuk, K. Pulli, and B. Girod. "Outdoors augmented reality on mobile phone using loxel-based visual feature organization." ACM International Multimedia Conference, 2008.
 T. Quack, B. Leibe, and L. Van Gool. "World-scale mining of objects and events from community photo collections." In Proceedings of the 2008 international conference on Content-based image and video retrieval, pages 47-56, ACM New York, NY, USA, 2008.
 D.G. Lowe. "Distinctive Image Features from Scale-Invariant Keypoints." International Journal of Computer Vision, 60(2):91-110, 2004.
 H. Bay, T. Tuytelaars, and L. Van Gool. "SURF: Speeded Up Robust Features." Lecture Notes in Computer Science, 3951:404, 2006.
 D. Gray, I. Kozintsev, Y. Wu, and H. Haussecker. "WikiReality: augmenting reality with community driven websites." International Conference on Multimedia Expo (ICME), 2009.
 M. El Choubassi, O. Nestares, Y. Wu, I. Kozintsev, and H. Haussecker. "An Augmented reality tourist guide on your mobile devices." 16th International Multimedia Modeling Conference, Chongqing, China, January 2010.
 BD. Lucas and T. Kanade. "An iterative image registration technique with an application to stereo vision. International Joint Conference on Artificial Intelligence, pages 674-679, 1981.
 O. Nestares and D.J. Heeger. "Robust multiresolution alignment of MRI brain volumes." Magnetic Resonance in Medicine, pages 705-715, 2000.
 Marius Muja and David G. Lowe. "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration." In International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009.
 H. Shao, T. Svoboda, and L. Van Gool. "ZuBuD: Zurich Buildings Database for Image Based Recognition." Technique report No. 260, Swiss Federal Institute of Technology, 2003.
Maha El Choubassi and Yi Wu are Senior Research Scientists at the Vision and Image Processing Research group/FTR at Intel labs.
This article and more on similar subjects may be found in the Intel Technology Journal, Volume 14, Issue 1, "Essential Computing: Simplifying And Enriching Our Work And Daily Life". More information can be found at http://intel.com/technology/itj.