Address information is one of the most commonly collected forms of data for companies across the world. It is also data that can easily be collected and stored in inaccurate or incomplete form.
Street names might be misspelled. Zip Codes could be left out when addresses are entered. Multiple customers could have the same name, creating uncertainty about which addresses map to which people. These are just some of the data quality errors that could appear in address information.
Fortunately, there is an easy way for developers to clean up address data, without having to purchase complex data quality tools or get Ph.D.s in data engineering. That solution is the TomTom Search API, which provides a structured geocoding call that can clean up address data. It also provides accurate latitude and longitude information that can take the place of unformatted raw data in order to deliver greater accuracy and exactitude.
This article explains how to get started with the TomTom Search API for address validation and data cleanup. We’ll discuss the benefits of geocoding and properly formatting your address data, then walk through a sample address cleanup program that leverages the structured geocoding API call from TomTom.
The Benefits of Geocoding and Structuring Your Address Data
As described above, it’s not uncommon when requesting data via an online form to run into issues with the validity of the requested data. This holds true in the case of requesting address data from a customer. Misspellings, incorrect capitalization and missing fields can cause obvious issues. Imagine a situation where a company reads from this database in order to perform a mailing to all customers who provided their address data. This company would want the mailing to make its way to the correct destination, but when viewing the list of addresses, the addresses may have misspellings that could result in delivery failures. Structuring address data properly would provide an essential fix for this issue.
Through the use of the Search API product from TomTom, an organization can pass the provided address data into the structured geocoding API call, and if they’ve received enough relevant address data to narrow down the search, they can receive the actual (properly formatted) address that will allow them to perform a mailing without any issues. In addition, the address will be geocoded, and they will have the ability to store the latitude and longitude along with the associated address in their database.
Geocoding provides several benefits that are undeniable in today’s day and age—many of which stem from an organization’s ability to analyze their customer base on a geographical level. Instead of simply staring at lines of address data on a page, they can instead look at a map of locations that may provide valuable insight into their market. Maybe their business is more successful in some parts of a particular city than others, and mapping data can help them discover where to focus their efforts. Or maybe they’ve cornered the market in one portion of a city, but there are neighborhoods that have critical similarities where the market remains untapped. Geocoding can assist in helping to analyze customer data in both of these cases.
Getting Started with the TomTom Search API
The first step towards utilizing structured geocoding from TomTom is to get set up with the TomTom Search API. Visiting the TomTom for Developers website and registering will bring you to a dashboard where you can select the option to add a new application. Providing an application name and selecting the Search API product will provide you with an API key for use with the Search API.
From the dashboard, you can see that you are allowed 2,500 API transactions per day for free. Should the 2,500 transactions not be enough to support your application, you can purchase credits for additional transactions by simply navigating to the My Credits tab of the dashboard and selecting the option to add credits. In addition, clicking the tab labeled “My API Transactions” will direct you to a page for tracking your API transaction usage by day.
After you have added an application approved for use of the Search API product and have received your API key, you are ready to develop your application that has access to the structured geocoding method for cleaning up geocoding addresses. An invaluable resource throughout development of an application that leverages the Search API product from TomTom is the online documentation for the resource, located here.
A Simple Java Implementation
So let’s take a look at a sample program that utilizes the structured geocoding API call from TomTom Search API. In an effort to demonstrate the capability of this API call, I’ve developed a simple Java implementation where a CSV input file acts as our unformatted address database, and a CSV output file acts as our formatted address database. From the input file we will read in each address, line by line, and provide the input as part of the structured geocoding request. Each address in the input file contains misspellings, missing fields. or even both. No latitude and longitude are stored for any address in the input file.
Below, you will see a screenshot from FormatAddresses.java. Class variables are set up for the API key as well as the CSV field separator (comma delimited), the new line separator (for use in writing to the output file), the headers for output file formatting and the input and output filenames with the path. For the sake of simplicity, I have written the functionality for the sample program right in the main method for the class, and we can simply run our program from our IDE.
The first thing we do in the main method is to create and open our output file and write our headers to the first line of the output file. Once we’ve appended our new line character at the end of the first line, we can ensure that we are now ready to write a formatted address to our output, which is simulating a formatted address database table. The next step is to read in our first line and instantiate a string array splitting on the comma delimiter. This will form an array where each position holds a field from the line in the input file. After this, we pass our string array to the constructor for UnformattedAddress.java, where we create an object that organizes our unformatted fields into attributes of a Java object which can then be passed to the HTTP GET request (our API call).
Once we have instantiated our unformatted address object, we can create our URL that we will use for the request. As you can see in the documentation for the API call, there are several parameters required to perform the structured geocoding call, and many more that are optional.
Those that are required include the following, which we provide when we build our URL object:
- Base URL: api.tomtom.com/search/
- Version Number: 2
- Response format: I have chosen JSON for this particular example. JSONP, JS and XML are also valid options.
- Country Code: In my example, I’ll be using US for the country code.
- API Key: You can insert the API key provided to you when you added your application through the TomTom developer dashboard.
The following are the optional request parameters I provided in my particular example:
- Street Number: The street number of the address, if provided in the input file, will be added to the API request. If not, then I will pass an empty string to the call.
- Street Name
- Municipality: City or town, if provided.
- Country Secondary Subdivision: County, if provided.
- Country Subdivision: The state in which the address is located.
- Postal Code: Zip Code, if provided.
As shown in the code above, the URL for the request is built using the above parameters. (For more information on additional request parameters that may be leveraged, please visit the documentation.)
The next step in my example was to send the HTTP request, parse the JSON response and build a formatted address object, which we can then leverage to write to our output file (simulated clean database). Please see the code below to see how I put these words into action:
Take the following input, for example, taken from the first line in my input file:
- Street Number: 4
- Street Name: Yawkey
- Municipality: Boston
- Postal Code: 2215
This is, of course, the address for Fenway Park. Yet it is incomplete. After providing these fields from the input to the request for the appropriate parameters, I am met with the following JSON response:
"query":"4 02215 yawkey boston",
"streetName":"Yawkey Way, Jersey St",
"municipality":"Boston, Boston University, Kenmore",
"country":"United States Of America",
"freeformAddress":"4 Yawkey Way, Boston, MA 02215",
Due to the fact that I am only interested in saving certain fields to my output “database,” I retrieve only the position and address data located at the root level in the result array. As you can see, that provides me with full address data as well as a freeform address field, in addition to the geocoding data (latitude and longitude) for Fenway Park. I simply write this address and position data to my output file and move on to the next record in the input file. Another field that may be of interest in the geocoding realm is the entry points array provided for each result. The precise location of the main entryway is given, which could be invaluable, depending upon what one is hoping to achieve by retrieving this data.
Unformatted address data is a common challenge. TomTom Search API’s geocoding request feature offers an easy-to-use solution for cleaning address data and building a database of geocoded locations—and then makes it available via simple HTTP GET requests.