Channels ▼
RSS

Database

An Introduction to Geocoding



Yuri Diomin (yuri.diomin@yurisw.com) is a president of Yuri Software, Inc., a software development firm in San Diego, California. He builds applications for several vertical industries, including address correction and geocoding tools.


In this article, I'll introduce theoretical aspects of how geocoding engines work. In the next installment, I will go into practical aspects of geocoding for real life scenarios.

"Hello World" Geocoder

To begin with, let’s try to build our own, basic geocoder and figure out what it might look like. In the tradition of "Hello World" programs, it will be a bare-minimum tool, made just for the purpose of illustrating the principles.

On the input, our geocoder will take a postal address in the form of character strings. Let's say it will be Line 1 and Line 2 of a common address format, such as:

506 4th Ave

Asbury Park, NJ 07712

On the output, it will return latitude and longitude of the location as floating point numbers. (In real life, geocoders often return a plethora of other information about the address, but we will limit ourselves to just coordinates in this example.)

The first step in implementing our geocoder is to build a database of reference addresses and their locations, usually known as the "street network dataset." In our Hello World case, the street network dataset will conveniently consist of just one record:

Address
506 4th Ave,
Asbury Park, NJ 07712

Latitude
40.223571

Longitude
-74.005973

The execution flow in our geocoder is then obvious. We simply:

  • Receive the input address.
  • Perform a database lookup by direct string comparison and find the corresponding reference record.
  • Return the latitude and longitude from the record as the output.

Mission accomplished!

Address Matching

In real life, the process is much more complicated. As the very first obstacle, we will come across the issue of address matching.

Let's say that our input address is not in the neat form of:

506 4th Ave

Asbury Park, NJ 07712

but rather:

506 Fourth Avenue Apt 1

Asbury Prk, New Jersey

Note the multitude of character differences between the two addresses.

Since addresses come from all sorts of sources, such as customer filled-out forms, dictation over the phone, etc., we cannot expect them to always be neatly formatted and standardized. A human looking at the two addresses above will easily see that they are one and the same. If we out either of them as a “mail to” address on a letter, we will expect the letter to be delivered without any problems. But if the computer performs a simple string comparison of the two addresses, they will not match and our geocoder will get a "miss" instead of a "hit."


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video