Recently, we needed to change the way a key field was being generated for one of our tables. Although we aren't actually using NHibernate to do the work, we wanted to use a HiLo type strategy to generate it and so investigated its workings to see if it was suitable.
NHibernate offers several built-in strategies available for assigning generated primary keys to your entities (although of course none of them are exclusive to NHibernate); the default that I suspect many people go for is the identity, or autonumbered strategy where the Database assigns an auto-incrementing integer number to the record. This is a convenient, safe and conceptually simple method but in a lot of situations, it's not a great choice. Performance wise, it can be quite poor; consider the case where you have an order with 5 order lines to the database from an application, both classes having identity primary keys. To commit the order to the database, NHibernate must first save the
order object and retrieve the primary key (which it can do at the same time), which it can then use for the foreign key on the order-line objects, each of which it must save individually also in order to update their primary keys. In total, that's 6 individual database accesses for what is conceptually one operation. If the primary keys are client rather than server generated, then the operation can be done in one batch: NHibernate already has all the information it needs to write everything from the get-go.
One alternative is to use Guids and this is a great solution if your keys don't need to be human-readable, but not so good if they do. In our case, we are also using the key as a customer reference so this was definitely not appropriate.
Which brings us to HiLo, which uses a kind of hybrid strategy to generate integer keys on the client, while coordinating with the server to ensure that duplicate keys are never generated. The process basically works on allocating blocks of keys; if the block size is 100 say, then the first block corresponds to keys 0 - 99, the second to 100 - 199 and so on. When a client app needs a new block of keys, it asks the server for the next block number and increments this value by one. Internally, the client keeps track of how many keys it has handed out and when it runs out, it just requests a new block. The block number is referred to as the high value, and the client tracked value is called the low (hence the hilo name). The block size is referred to (at least in NHibernate) and the
max_lo and this can be set as a parameter in your mapping file. The formula for working out a key on the client side is therefore something like:
hi * max_lo + current_lo++ and when
max_lo, then a new block is requested and
current_lo is set back to
NHibernate can automatically generate a table to store the Hi values in or you can add one yourself, the schema will probably look similar to this. HiLo values for multiple tables can be stored in the same table.
So, now we have an idea of how it works, is it a good idea? There are a number of things to take into account when using a HiLo strategy:
- All clients must follow the same rules: The rules for generating ids are stored in the client application not the database server so it's important to make sure anything that will be inserting into your tables will be following the same rules, using the same parameters. If one application thinks the batch size is 10 and another thinks it is 100, then when they get the next hi value from the database, they will intepret the range of values they have been assigned differently and you'll get conflicts.
- Gaps and non-sequential ordering. If your app gets restarted for some reason before it's used all of its allocated ids, then the unused numbers are effectively 'lost' and you'll get gaps in your sequences. Similarly, if you have multiple clients using the db, ids will only be temporally ordered within batches of ids not across them because of the allocation method. In other words, if your batch size is 10, then 12 definitely comes after 11 but not necessarily before 21.
- Batch size tuning: If you have a lot of inserts going on, then a largest batch size is better to reduce the number of times the client has to talk to the db to get more ids. On the other hand, if inserts are relatively infrequent, then a smaller batch size may be better to avoid getting big gaps in your ids. Because of point 1, if you have multiple clients that are under different loads, you can't give them different batch sizes. Also if you get the initial size wrong, you need to be very careful when changing it.
Because of these considerations, in the end, we decided to go with a custom solution in which the database stores the highest id allocated rather than the batch size, clients can then request a different number of ids based on their needs, allowing a bit more flexibility. Although we weren't actually using NHibernate for the work we were doing, this would be fairly easily to implement using a custom id generator.
Primary key generation is important and you should give thought to how it happens. HiLo is one of several useful strategies available, which should be evaluated to see which one best suits your requirements.