Click here to Skip to main content
15,893,668 members
Articles / Programming Languages / SQL
Article

The Keyspace Problem

Rate me:
Please Sign up or sign in to vote.
2.22/5 (21 votes)
20 Jul 20063 min read 52K   15   17
Just how many customers can you have in your database?

Introduction

If I had a nickel for every time I have written an application with a customer object mapped to a database ...

In a properly normalized database, the issue I am going to present does not occur. However, in every application I have worked on, the problem has existed, and while 64 bit keys will virtually eliminate it in the near future, it is still a cause for concern in systems using 32 bit keys and less (think micro databases on handheld computers, or legacy VB6 apps.)

The problem

Imagine the following database schema. For the sake of this article, the ID columns are identity columns of 8 bits (makes for easier math).

Diagram of a Database Schema

In this database, I will now insert five customers. I will only show customer 1 for brevity sake.

Diagram of a Customer 1

A quick inspection shows us that for the first customer, we have now incremented the order ID table by 3 and the line items table by 15. Now, I will insert 17 more customers and get an error. Where did the error come from? View this illustration of customer ID 17:

Diagram of a Customer 17

A quick analysis shows that the keyspace for line items has been exhausted!

Doing a formal analysis, a formula can be derived:

floor(2 <SUP>k</SUP> /(Π<SUB>1...n</SUB>X));

Plugging in the numbers, 28 /(5*3) = 17; which just so happens to be exactly the number of customers we ended up with!

For a brief explanation of the formula: k is the key size. In this equation, k is fixed; however, there is a more complicated and more general equation that can be derived for tables with mixed key sizes. Xi is the expected number of records in a given table for a particular parent (mean).

Conclusion

There is no substitute for formal analysis of design. This article is presented as an illustration of an all too common scenario in database design which I have seen over and over in my consultancy. While 64 bit keys effectively eliminate this issue, it is masking the problem rather than identifying it. After all, a well designed database occupies the smallest footprint possible. Using 64 bit keys on a domain of 5 items is ludicrous.

As a small addendum: this problem does exist for 32 bit key sizes, especially in Visual Basic. Imagine a medium sized company with a lot of repeat business. Oh, I don't know, how about a delivery grocery store in a city with a population of 800,000. The max draw is probably 3% of the target market, yielding a maximum customer base of 24,000 customers in the metro area. Each customer will need groceries weekly, and purchase, on an average, 150 items. (Don't believe, look at your last grocery receipt.)

232/(52*150) = 550,636 customers. Expand to 5 years of operations, and 232/((5*52)*150) = 110,127 customers.

Seemingly, there is not a problem. However, if such a business model is a success (a draw of 3% is a success), a prudent business person would expand to other cities. With this simple model, the expansion into 7 cities would cripple the database and the code it was written against.

Notes

I rewrote this article to remove the confusion about my point, on 2006/07/18.

Points of interest

A good place to find answers to hard problems: Google.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Architect ERL GLOBAL, INC
United States United States
My company is ERL GLOBAL, INC. I develop Custom Programming solutions for business of all sizes. I also do Android Programming as I find it a refreshing break from the MS.

Comments and Discussions

 
QuestionTempest in a Teapot? [modified] Pin
W Balboos, GHB31-Jul-06 10:19
W Balboos, GHB31-Jul-06 10:19 
AnswerRe: Tempest in a Teapot? Pin
Ennis Ray Lynch, Jr.31-Jul-06 10:25
Ennis Ray Lynch, Jr.31-Jul-06 10:25 
GeneralRe: Tempest in a Teapot? Pin
W Balboos, GHB2-Aug-06 2:39
W Balboos, GHB2-Aug-06 2:39 
GeneralRe: Tempest in a Teapot? Pin
Ennis Ray Lynch, Jr.2-Aug-06 4:30
Ennis Ray Lynch, Jr.2-Aug-06 4:30 
GeneralNot sure on your assumptions Pin
simon.proctor21-Jul-06 3:05
simon.proctor21-Jul-06 3:05 
Hi,

If you have, as per your example, a total of 255 values for the Orders pkey. Then naturally you can have a max of 255 orders. The same holds for the products (not shown but implied) and then the link between an order and the products, the Lineitem in this case.

So you create 255 customers (pkey values starting at 1), each makes one order so you now have exhausted your pkey space for the orders. Ok, so far we have 255 orders for 255 customers. They can order 1 or more of 255 products as much as they want.

So now I think, hmm - I'll use a 4byte unsigned int. So that is a 2^32 integer for the customers and the orders and the products. Using an identity column this gives me 2^32 customers, orders and products. The lineitem table doesn't matter as I'll be grabbing that info using the order pkey in a select condition.

So I have 4,294,967,296 possible rows for each. Ok, so now lets examine your grocery example :

Each customer will need groceries weekly, and purchase, on an average, 150 items. (Don't believe, look at your last grocery receipt.)

2^32/(52*150) = 550,636 customers.

Why do you divide your order space by the number of items that a customer may order? When a customer makes an order they have 1 order entry and 1*150 entries in the lineitem table. The lineitem table count doesn't restrict the total space allocatable in the orders table.

If I make three orders, these are new orders so I use three rows in the orders table. I'm ordering existing products (can't order what doesn't exist) so this doesn't affect the products table. The lineitem table doesn't need its own pkey so that isn't affected either. If I order 150 items each time, I use 450 rows in the lineitem table.

Based on that, my original 3 orders would have 1 row in the customers table, 3 row in the orders table and how ever many productlines I order in the lineitems table (450 in this case). I exhast one possible entry for customers and three for orders. None for products or line items.

Assuming that I go global 4,294,967,296 customers and/or orders is still quite small. I can either increase the byte count in some way (64bit platform?) or I can use GUIDs. If I use GUIDs I'm using string values so my joins/lookups *may* be slightly slower.

Still, that is a significantly better count than you surmise.

Or have I misunderstood your point?

Note, you don't need a pkey for the lineitem as its selectable and usable directly based on the order key. If you need to uniquely identify each lineitem row for some reason (say to update an order) you can use either use the row data as criteria or add a GUID (or similar).
GeneralRe: Not sure on your assumptions Pin
Ennis Ray Lynch, Jr.22-Jul-06 19:53
Ennis Ray Lynch, Jr.22-Jul-06 19:53 
GeneralRe: Not sure on your assumptions Pin
simon.proctor23-Jul-06 0:05
simon.proctor23-Jul-06 0:05 
GeneralThis is bad... Pin
The Dark One 66618-Jul-06 0:14
The Dark One 66618-Jul-06 0:14 
GeneralI suppose I messed up Pin
Ennis Ray Lynch, Jr.18-Jul-06 3:22
Ennis Ray Lynch, Jr.18-Jul-06 3:22 
QuestionHave you ever heard about GUIDs? Pin
..Hubert..17-Jul-06 23:27
..Hubert..17-Jul-06 23:27 
QuestionWhat's the problem? Pin
mav.northwind15-Jul-06 1:45
mav.northwind15-Jul-06 1:45 
AnswerYou would have to had worked on VB6 Pin
Ennis Ray Lynch, Jr.17-Jul-06 10:21
Ennis Ray Lynch, Jr.17-Jul-06 10:21 
GeneralRe: You would have to had worked on VB6 [modified] Pin
mav.northwind17-Jul-06 20:40
mav.northwind17-Jul-06 20:40 
GeneralMy article Pin
Ennis Ray Lynch, Jr.18-Jul-06 3:21
Ennis Ray Lynch, Jr.18-Jul-06 3:21 
GeneralRe: My article Pin
mav.northwind18-Jul-06 9:28
mav.northwind18-Jul-06 9:28 
GeneralCheck back in a few days Pin
Ennis Ray Lynch, Jr.18-Jul-06 10:43
Ennis Ray Lynch, Jr.18-Jul-06 10:43 
GeneralRe: Check back in a few days Pin
mav.northwind19-Jul-06 7:50
mav.northwind19-Jul-06 7:50 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.