Click here to Skip to main content
Email Password   helpLost your password?

Introduction

Extreme Optimization can lead to remarkable improvements in both speed and size. In this article, we will see an example of how we can modify a standard algorithm using our knowledge of the problem at hand to achieve significant performance gains.

Background

We build on an article written by R. Reyes called Optimized IP to ISO3166 Country Code Mapping in C#. The objective is to perform fast lookup of the country code corresponding to an IP address in a table of over 52,000 entries while keeping memory usage to a minimum. We are only interested in the current IPv4 addresses, consisting of a sequence of 4 byte values in the range 0-255.

The existing solution

Mr. Reyes introduces the PATRICIA trie structure (pronounced �try� according to Donald Knuth) to efficiently perform the lookups. We will not repeat the details of the implementation here. They can be found in the original article.

The Extreme Optimization

Extreme Optimization� begins with a thorough understanding of the problem we are solving. We combine this understanding with in-depth knowledge of existing algorithms

Storing IP addresses more efficiently

The original algorithm uses a BitVector class to store the keys. Keys are represented internally as a Byte array. Only the distinctive portion of the key is stored, so many vectors will actually be only one byte long. However, a lot of shifting of bits is involved in isolating the relevant portion of the key and to check for matches.

A first observation is that we do not need a general BitVector class to store our data. Our keys are sequences of up to 32 bits. They fit perfectly into a 32 bit unsigned integer. This will eliminate the overhead of managing the array object.

Furthermore, we can eliminate the excessive shifting by storing the entire key in every node and using the bitwise Xor operator. The boolean Xor (exclusive OR) operator returns true if its operands are different and false if they are the same. The bitwise version compares identically positioned bits in two numbers. It assigns 0 to the corresponding bit in the result when the bits are the same, and 1 when they are different.

To find the shared portion of two keys, we Xor the two values. The key length is the number of leading zeros in the result. We can find this number by comparing the result to powers of two. For a key length of 32 bits, if the result is smaller than 2^n, we know that the first 32-n bits of the two keys are the same.

Here is an example. We want to find the common key length of two IP addresses, IP1= 207.176.130.34 and IP2= 207.176.151.11.

IP Address Binary value Numerical value

IP1 (207.176.130.34)

11001111.10110000.10000010.00100010

411,303,936

IP2 (207.176.151.11)

11001111.10110000.10010111.00001011

411,336,704

IP1 xor IP2

00000000.00000000.00010101.00101001

5,417

The result, 5417, is smaller than 8192 = 2^13. Therefore, the two IP�s have the first 32 - 13 = 19 bits in common.

When comparing a key value to the key values of the children of a node, we can start counting up from the key length of the parent node.

Optimizing Patricia tries for binary keys

Secondly, we are dealing with binary data. The acronym PATRICIA is short for Practical Algorithm To Retrieve Information Coded In Alphanumeric. This name suggests that Patricia tries are best suited for alphanumerical data, where every node can have many children. The keys in our problem are sequences of 0�s and 1�s. This means that any node can have at most two children. One child will have a prefix starting with a 0. The other will have a prefix starting with a 1.

This means we can eliminate the ArrayList we have been using to store the references to the child nodes. We replace it with two references, one each for the child whose key prefix starts with a 0 or a 1, respectively. If either child is not present, the corresponding reference is null.

An ArrayList is a relatively expensive structure to store very short lists. In addition to the usual overhead associated with reference types, it also allocates space for unused entries. Eliminating both the overhead and the unused space should significantly reduce our memory requirements.

When looking up values in the trie, we can simply use the value of the first bit after the current key length to select the next child node. We can then quickly check that our value matches the full length of the child node.

Further Optimizations

A trie can work with one root element. The first few steps of the lookup will be spent working through the first byte or 8 bits of the IP address. If we create a separate root for each possible value of the first bits in the key, we eliminate 6 or 7 unnecessary recursive steps. We can do even better by pre-selecting on more leading bits. We will call this number of bits the index length.

We have to be somewhat careful, however. Every root node spans a certain range of IP addresses. Some entries in our input tables will span a range that covers more than one root node. We will have to add these entries to all the appropriate root nodes, adjusting the range to fit that root nodes' range. This will come at a small penalty in the memory footprint, but the speed gain is significant.

If we use an index length of at least 1, we can use Int32 values for the keys instead of the non CLS-compliant UInt32 without any complications resulting from using signed numbers. A key passed to a node will always have at least the leading bit in common with the node's key. The leading bit of the result of the XOR operations will therefore always be zero, indicating a positive value.

Looking at the code

The new project contains three classes.

BinaryTrie has two constructors. The first constructor takes one argument, indexLength, which is the number of bits to use in pre-selection. It creates the _roots array which will contain the root nodes. The default constructor uses an index length of 1.

The Add method adds a key to the trie and returns the BinaryTrieNode corresponding to the key. The AddInternal method does the actual work of adding a key to the trie. If no root node exists for the given index, it creates one. If the root node exists, the request is passed on to the root node, which does the bulk of the work.

BinaryTrieNode represents an entry in the trie. It contains the following fields:

BinaryTrieNode has one internal constructor that is used by BinaryTrie to create a root node. It also contains two private constructors. One creates a shallow copy of a node. The other creates a node using the data that was passed to the Add method. It calculates the key length based on the number of addresses available to the network.

Most of the hard work of building the trie is done in the AddInternal() method. We first calculate the common key length of the current node and the new entry. If the new key length is smaller than the common key length, the new entry becomes the parent of the current node. If the match is complete, i.e. the common key length is at least the key length of the current node, the new entry is passed on to a child node. Which child we work with depends on the first bit after the current key.

If the match is incomplete, the current node is replaced with a new node with the common portion of the key as its key. It has two children: (a copy of) the current node, and a node for the new entry. In both cases, the bit at position _keyLength+1 determines which of the two child nodes gets to go in the _zero and _one slots, respectively.

The lookup is passed on from IPCountryTable straight to the correct root BinaryTrieNode, if it exists. If the lookup key matches one of the child node keys, the request is passed on to that child node. Otherwise, we have found the best match.

Benchmark results

Using the same data set and test code as the original article, we found the following results. The Optimized version #1 refers to the original version of the code as posted here on May 13, 2003. The Optimized version #1.1 refers to the latest code, which includes the further optimizations.

Original version

Optimized version #1

Optimized version #1.1

Result

Result

Change

Result

Change

Memory usage

9,409 KB

2,578 KB

- 72.6%

2,963 KB

- 68.5%

Load time

7.361s

2.864s

- 61.1%

1.858s

- 74.8%

25,000 lookups

3.475s

0.2567s

- 92.6%

0.1997s

- 94.3%

Lookups/second

7,194

97,048

+ 1,249%

125,190

+ 1,640%

Here is a graphical representation of the results:

Graphic representation of the benchmark results.

Possible improvements

Is there room for even more optimization? Possibly.

A single country code will have many nodes referencing it. We can eliminate a node if, for each of the IP addresses in its range, the best matching node in the resulting trie has the same country code as the node we eliminated. This will happen if the node's parent node has the same country code.

After the trie has been built, we can go through all the nodes iteratively and eliminate unnecessary nodes. If we can eliminate a significant number of nodes, the memory usage will decrease and the lookup speed will increase accordingly. This possibility will be explored in a future update of this article.

Conclusion

The two main objectives were minimizing memory usage and maximizing speed. We have further optimized an already optimized solution to produce dramatic improvements in both areas. We originally reduced memory usage by 3/4, and increased the lookup speed by a factor of more than 13.

With the latest version, we give up a little bit of memory for an additional 30% increase in performance. We also untangled the binary trie code from the IP lookup code. It can now be used independently.

References

Update

APNIC (the Asia Pacific Network Information Centre), one of four organizations that publishes the data used in this article, recently changed the format of its statistics file to conform to the format used by the other three organizations. The size field in the database now contains the total size of the segment rather than the number of bits. To use files in this new format, you will need to make a change in the source code. In the call to LoadStatisticsFile which loads the APNIC data file, set the calculateKeyLength parameter to true.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralTnx for great article
pekica
6:46 26 Jul '05  
Thanks both to Reyes and you for great solution onto this common problem during developing web site.
GeneralSmall Bug
Kentamanos
16:29 12 Jun '03  
I found a problem when I updated all of the database files and started to use your implementation to verify and benchmark mine (mine uses an ArrayList and uses a slightly modified binary search). I compared your results to mine and started to look at all of the differences.

There seems to be an error when the length in the database files is not a power of two (I think that's the reason). For instance, take a look at the newest ARIN file (ftp://ftp.arin.net/pub/stats/arin/arin.20030601).

Line 64 in that file reads:
arin|US|ipv4|24.92.160.0|221184|0|allocated

In your test file (iplist.txt), the entry on line 245 is:
24.94.224.164

According to line 64 of arin.20030601, this IP address is in the US. In other words the range 24.92.160.0 -> 24.95.255.255 is in the US. The value above (24.94.224.164) returns a null in your application.

I'm sure this problem occurs due to ARIN (in this case) changing their format. It appears they're actually trying to combine ranges for optimization purposes.

I also wonder what your attitude on "false positives" is. For instance, using the original files that you packaged with your source code (all in the ZIP file from your website), we can see what should "technically" happen when we search for the IP address 66.28.71.23 (also in iplist.txt). Line 582 of the original ARIN file reads:

arin|US|ipv4|66.28.0.0|8192|2000-09-12|allocated

This is the closest match we have for 66.28.71.23, but technically speaking, it's not a match. 66.28.0.0|8192 should encompass 66.28.0.0->66.28.31.255, right? In your system this comes back as US.

I'm not sure if the false positives are a "bug" or not. They're probably a pretty good guess for one thing Smile.

In any event, thanks for the article. It's really good stuff.
GeneralRe: Small Bug
Jeffrey Sax
21:45 8 Jul '03  
Hi Kent,

You are correct in both cases.

The original assumption that the length of a range in a database record is a power of two is no longer valid. The code in LoadStatisticsFile needs a few changes. Since our trie nodes can only have lengths that are powers of two, things are a little less elegant than we'd like.

As for the false positives, these are the result of trie nodes being split and the new parent node not being assigned the correct country code. The minimization algorithm which removes unnecessary nodes (as suggested in the 'Possible improvements' section) takes care of this.

I'll be updating this article in the near future. The updated version, and source code, is already available from my web site: http://www.extremeoptimization.com/[^]

Thanks for your good work!

Jeffrey

Everything should be as simple as possible, but not simpler.
    -- Albert Einstein

http://www.extremeoptimization.com/
GeneralUpdate: 30% faster!
Jeffrey Sax
10:43 28 May '03  
The latest update to this article and its associated source code is available on the Extreme Optimization web site[^].

It boasts another 30% increase in speed. This makes this code more than 17 times faster than the original. I've also cleaned up the interfaces somewhat so the binary trie is now reuseable.

The updated article will be available on the CodeProject soon.

Jeffrey

Everything should be as simple as possible, but not simpler.
    -- Albert Einstein

http://www.extremeoptimization.com/
GeneralTrie pronounced "Try" not "Tree"
Blake Coverett
18:44 27 May '03  
Vol 3, 6.3 Digital Searching:


...a trie structure; this name was suggested by E. Fredkin [CACM 3 (1960), 490-500] because it is a part of the information retrieval. A trie—pronounced "try"—is essentially an M-ary true, whose nodes are M-place vectors with components corresponding to digits or characters.


One doesn't argue with Knuth.

A fun series of articles though, gentlemen. My compliments.

-Blake (wearing his offical 'old fart' hat)
GeneralGreat but ...
Sébastien Lorion
20:51 25 May '03  
Your code is very fast, that is for sure !

But I think there is room for improvement in the design of the classes, specially regarding the encapsulation and flexibility. But you got me hooked so I wrote a whole article to demonstrate what I mean Smile





Intelligence shared is intelligence squared.
GeneralRe: Great but ...
Jeffrey Sax
10:45 28 May '03  

Sébastien Lorion wrote:
But I think there is room for improvement in the design of the classes, specially regarding the encapsulation and flexibility.

You're right, there was room for improvement. See update above. Wink

Have a great day,

Jeffrey

Everything should be as simple as possible, but not simpler.
    -- Albert Einstein

http://www.extremeoptimization.com/
GeneralRe: Great but ...
Sébastien Lorion
13:11 28 May '03  
Yes, by doing a bit of path compression for the first 8 bits, it clearly can improve performance. Did you made tests to see if 8 bits was the ideal tradeoff ?

I deleted my article since it's main point was to provide a "cleaner" implementation and a bit of added functionalities.

But I will post another one about the trie stucture as soon as I make it a full featured class. I will reuse code I made to lookup IP as a demo for its use.

On another note, I would personnaly use a "use-a" rather than "is-a" relationship between the IPCountryLookupTable and the trie. That way, if you ever find a better data structure to hold IPs, you won't be screwed.

Also, to hard code the way data files are processed might be fine now, but it is rather optimistic to think their format will never change Smile In fact, as you noted in your article, APNIC did change his format recently.

Keep up the good work !

Sébastien



Life is short : forgive quickly, kiss slowly.
GeneralRe: Great but ...
Jeffrey Sax
13:24 28 May '03  

Sébastien Lorion wrote:
Did you made tests to see if 8 bits was the ideal tradeoff ?

Yes, I did. The new code allows you to test anywhere from 1 to 16 bits. Memory usage actually decreases initially, because you eliminate some intermediate nodes. Speed keeps increasing all the way up to 16 bits.

Once over 12 bits, the array of root nodes crosses the garbage collector's 85KB 'large object' threshold. In some instances, this may not be desirable.

Sébastien Lorion wrote:
I would personnaly use a "use-a" rather than "is-a" relationship between the IPCountryLookupTable and the trie. That way, if you ever find a better data structure to hold IPs, you won't be screwed.

I'm not really concerned about that. Also, this way I can use members (like AddInternal) that I would rather not have public.

Sébastien Lorion wrote:
Keep up the good work !

I will! Smile There may still be room for further improvement, by eliminating nodes that don't affect the country code of the best match. But that's for later.

Have a great day,

Jeffrey

Everything should be as simple as possible, but not simpler.
    -- Albert Einstein

http://www.extremeoptimization.com/
Generalbugs?
Frank Hileman
5:34 22 May '03  
Has this new code gone through any real testing? I ask, because when I ran ITCTest2(), the function, one address failed.
81.50.140.32 = FR (ICI)

Which leads me to wonder if the data structure is populated correctly. A good test would be to run through all of the addresses in the original data, and check each one.

Regards,
Frank Hileman
GeneralRe: bugs?
Jeffrey Sax
10:47 22 May '03  
Hi Frank,

Yes, it has been tested fairly extensively. The discrepancy you found is an error in the test data. I just copied the code from the original article.

If you look up the country for this IP at WhereIs[^] or GeoIP[^], you'll see that it comes up as FR (France) there, too.

The databases change constantly, with some providers publishing new updates on a daily basis. This also makes testing more troublesome.

Have a great day,

Jeffrey

Everything should be as simple as possible, but not simpler.
    -- Albert Einstein

http://www.extremeoptimization.com/
GeneralNew apnic.latest has key length pre-calculated
eekthecat
14:35 20 May '03  
For those looking to use the stock code but retrieve updated files, you'll have to change the CalculateKeylength parameter to true for the LoadStatisticsFile method for "apnic.latest" since the updated files have the key lengths calculated.

If you don't you'll get the error "Index was outside the bounds of the array".

This is an awesome article and application!
GeneralRe: New apnic.latest has key length pre-calculated
JeffreySax
15:07 20 May '03  
eekthecat wrote:
For those looking to use the stock code but retrieve updated files, you'll have to change the CalculateKeylength parameter to true for the LoadStatisticsFile method for "apnic.latest" since the updated files have the key lengths calculated.

Thanks for the suggestion. I believe you meant the updated files need the key lengths calculated. If you read all the way to the end of the article, you'll read exactly that under the 'Update' heading. Smile

Have a wonderful day,

Jeffrey

Everything should be as simple as possible, but not simpler.
    -- Albert Einstein

http://www.extremeoptimization.com/


Last Updated 31 May 2003 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010