Introduction
If I had a nickel for every time I have written an application with a customer object mapped to a database ...
In a properly normalized database, the issue I am going to present does not occur. However, in every application I have worked on, the problem has existed, and while 64 bit keys will virtually eliminate it in the near future, it is still a cause for concern in systems using 32 bit keys and less (think micro databases on handheld computers, or legacy VB6 apps.)
The problem
Imagine the following database schema. For the sake of this article, the ID columns are identity columns of 8 bits (makes for easier math).
In this database, I will now insert five customers. I will only show customer 1 for brevity sake.
A quick inspection shows us that for the first customer, we have now incremented the order ID table by 3 and the line items table by 15. Now, I will insert 17 more customers and get an error. Where did the error come from? View this illustration of customer ID 17:
A quick analysis shows that the keyspace for line items has been exhausted!
Doing a formal analysis, a formula can be derived:
floor(2 <SUP>k</SUP> /(Π<SUB>1...n</SUB>X));
Plugging in the numbers, 28 /(5*3) = 17; which just so happens to be exactly the number of customers we ended up with!
For a brief explanation of the formula: k is the key size. In this equation, k is fixed; however, there is a more complicated and more general equation that can be derived for tables with mixed key sizes. Xi is the expected number of records in a given table for a particular parent (mean).
Conclusion
There is no substitute for formal analysis of design. This article is presented as an illustration of an all too common scenario in database design which I have seen over and over in my consultancy. While 64 bit keys effectively eliminate this issue, it is masking the problem rather than identifying it. After all, a well designed database occupies the smallest footprint possible. Using 64 bit keys on a domain of 5 items is ludicrous.
As a small addendum: this problem does exist for 32 bit key sizes, especially in Visual Basic. Imagine a medium sized company with a lot of repeat business. Oh, I don't know, how about a delivery grocery store in a city with a population of 800,000. The max draw is probably 3% of the target market, yielding a maximum customer base of 24,000 customers in the metro area. Each customer will need groceries weekly, and purchase, on an average, 150 items. (Don't believe, look at your last grocery receipt.)
232/(52*150) = 550,636 customers. Expand to 5 years of operations, and 232/((5*52)*150) = 110,127 customers.
Seemingly, there is not a problem. However, if such a business model is a success (a draw of 3% is a success), a prudent business person would expand to other cities. With this simple model, the expansion into 7 cities would cripple the database and the code it was written against.
Notes
I rewrote this article to remove the confusion about my point, on 2006/07/18.
Points of interest
A good place to find answers to hard problems: Google.