|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionExtreme Optimization can lead to remarkable improvements in both speed and size. In this article, we will see an example of how we can modify a standard algorithm using our knowledge of the problem at hand to achieve significant performance gains. BackgroundWe build on an article written by R. Reyes called Optimized IP to ISO3166 Country Code Mapping in C#. The objective is to perform fast lookup of the country code corresponding to an IP address in a table of over 52,000 entries while keeping memory usage to a minimum. We are only interested in the current IPv4 addresses, consisting of a sequence of 4 byte values in the range 0-255. The existing solutionMr. Reyes introduces the PATRICIA trie structure (pronounced ‘try’ according to Donald Knuth) to efficiently perform the lookups. We will not repeat the details of the implementation here. They can be found in the original article. The Extreme OptimizationExtreme Optimization™ begins with a thorough understanding of the problem we are solving. We combine this understanding with in-depth knowledge of existing algorithms Storing IP addresses more efficientlyThe original algorithm uses a BitVector class to store the keys. Keys are represented internally as a Byte array. Only the distinctive portion of the key is stored, so many vectors will actually be only one byte long. However, a lot of shifting of bits is involved in isolating the relevant portion of the key and to check for matches. A first observation is that we do not need a general BitVector class to store our data. Our keys are sequences of up to 32 bits. They fit perfectly into a 32 bit unsigned integer. This will eliminate the overhead of managing the array object. Furthermore, we can eliminate the excessive shifting by storing the entire key in every node and using the bitwise Xor operator. The boolean Xor (exclusive OR) operator returns true if its operands are different and false if they are the same. The bitwise version compares identically positioned bits in two numbers. It assigns 0 to the corresponding bit in the result when the bits are the same, and 1 when they are different. To find the shared portion of two keys, we Xor the two values. The key length is the number of leading zeros in the result. We can find this number by comparing the result to powers of two. For a key length of 32 bits, if the result is smaller than 2^n, we know that the first 32-n bits of the two keys are the same. Here is an example. We want to find the common key length of two IP addresses, IP1= 207.176.130.34 and IP2= 207.176.151.11.
The result, 5417, is smaller than 8192 = 2^13. Therefore, the two IP’s have the first 32 - 13 = 19 bits in common. When comparing a key value to the key values of the children of a node, we can start counting up from the key length of the parent node. Optimizing Patricia tries for binary keysSecondly, we are dealing with binary data. The acronym PATRICIA is short for Practical Algorithm To Retrieve Information Coded In Alphanumeric. This name suggests that Patricia tries are best suited for alphanumerical data, where every node can have many children. The keys in our problem are sequences of 0’s and 1’s. This means that any node can have at most two children. One child will have a prefix starting with a 0. The other will have a prefix starting with a 1. This means we can eliminate the ArrayList we have been using to store the references to the child nodes. We replace it with two references, one each for the child whose key prefix starts with a 0 or a 1, respectively. If either child is not present, the corresponding reference is An ArrayList is a relatively expensive structure to store very short lists. In addition to the usual overhead associated with reference types, it also allocates space for unused entries. Eliminating both the overhead and the unused space should significantly reduce our memory requirements. When looking up values in the trie, we can simply use the value of the first bit after the current key length to select the next child node. We can then quickly check that our value matches the full length of the child node. Further OptimizationsA trie can work with one root element. The first few steps of the lookup will be spent working through the first byte or 8 bits of the IP address. If we create a separate root for each possible value of the first bits in the key, we eliminate 6 or 7 unnecessary recursive steps. We can do even better by pre-selecting on more leading bits. We will call this number of bits the index length. We have to be somewhat careful, however. Every root node spans a certain range of IP addresses. Some entries in our input tables will span a range that covers more than one root node. We will have to add these entries to all the appropriate root nodes, adjusting the range to fit that root nodes' range. This will come at a small penalty in the memory footprint, but the speed gain is significant. If we use an index length of at least 1, we can use Looking at the codeThe new project contains three classes.
The
Most of the hard work of building the trie is done in the If the match is incomplete, the current node is replaced with a new node with the common portion of the key as its key. It has two children: (a copy of) the current node, and a node for the new entry. In both cases, the bit at position The lookup is passed on from Benchmark resultsUsing the same data set and test code as the original article, we found the following results. The Optimized version #1 refers to the original version of the code as posted here on May 13, 2003. The Optimized version #1.1 refers to the latest code, which includes the further optimizations.
Here is a graphical representation of the results:
Possible improvementsIs there room for even more optimization? Possibly. A single country code will have many nodes referencing it. We can eliminate a node if, for each of the IP addresses in its range, the best matching node in the resulting trie has the same country code as the node we eliminated. This will happen if the node's parent node has the same country code. After the trie has been built, we can go through all the nodes iteratively and eliminate unnecessary nodes. If we can eliminate a significant number of nodes, the memory usage will decrease and the lookup speed will increase accordingly. This possibility will be explored in a future update of this article. ConclusionThe two main objectives were minimizing memory usage and maximizing speed. We have further optimized an already optimized solution to produce dramatic improvements in both areas. We originally reduced memory usage by 3/4, and increased the lookup speed by a factor of more than 13. With the latest version, we give up a little bit of memory for an additional 30% increase in performance. We also untangled the binary trie code from the IP lookup code. It can now be used independently. References
UpdateAPNIC (the Asia Pacific Network Information Centre), one of four organizations that publishes the data used in this article, recently changed the format of its statistics file to conform to the format used by the other three organizations. The size field in the database now contains the total size of the segment rather than the number of bits. To use files in this new format, you will need to make a change in the source code. In the call to
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||