Introduction
Databases have many isolation levels but, in general, they are not used properly, as the names by itself don't help their usage. This article will try to show how isolation levels can be helpful when used properly.
Background
I already developed many Object-Relational Model frameworks. In general, they were built to work with some CASE software, many which were built together with the framework. Actually, I am trying to avoid such CASE software. I want to have simple objects on which I will only use a software to create the database from them. So, I can concentrate my efforts in coding, and the database will be a reflection of it. In many cases, my framework can look similar to NHibernate, but without all those configuration files and attributes. Only strictly necessary attributes are used. I discovered my real problem when I was trying to simulate Triggers in classes. The really big issue was that I read other records to check for some condition, and then allowed or disallowed my modification. But, if I were inserting two mutually exclusive records at the same time, in different transactions, both checks were OK before commit, and I end-up in an inconsistent state. So, I checked the isolation levels, and I decided to create a new definition for them.
When to use?
- ReadCommitted - For me, this is the read-only isolation level. It can be used for inserts and deletes, but only if the data being validated does not requires multi-record validations. I strongly discourage using this isolation level for updates.
- RepeatableRead - This is the most confusing name in my opinion, and I think it is also the most problematic one to use. Locks are placed in data you read, so other transactions will not be able to modify the data while the actual transaction is not terminated. This is a good option if you want to read record data, verify if something needs modification, and then modify it. But, only for single record validations. The most problematic part about this isolation level is that you can read a record, read all other records that can cause problems to it, and they will not be modified by other transactions. But, nothing will forbid a new record that can cause problems to the actual record to be inserted in the meantime. So, you can be inserting a new record that is valid considering the actual state of this record, and at the same time, you will be modifying this record to a state that is invalid, as both transactions will not be seeing the new data of the other transaction until commit is done.
- Serializable - This is the perfect isolation level for "trigger" simulation. After you read one table, the read records will not be able to be modified by other transactions, and new records will also be avoided. So, if one transaction is changing a record and another one is inserting a record, both in this isolation level and first checking if they can do the modification, one transaction will acquire the lock, do all the work, while the other one waits the first to commit or rollback. The biggest problem with this isolation level is that you can cause a dead-lock if, for example, one transaction first locks table A and then tries to lock table B, while the other transaction already holds the lock for B and tries to lock table A.
- Chaos and ReadUncommitted, in my opinion, are completely useless for validations, and Snapshot, as the name says, is a snapshot, so it can be useful for backups, as all tables will be at the same version.
Simplifying:
- ReadCommitted - the best choice for read-only transactions, or for insert/delete transactions if all validations are done only in the same record that is being inserted or deleted. If other records are involved, don't use this isolation level for validations.
- RepeatableRead - is only useful if you are updating records and doing validations only in the record being updated. As happens in the first case, if other records are involved, don't use this isolation level for validations.
- Serializable - is the only choice for multi-record validations, but you must be careful not to cause dead-locks. Sometimes, I create a "single record lock table" for processes that involve many tables, so I first cause an update in such a lock-table and then continue with the process, avoiding dead-locks, and also avoiding the use of specific database lock commands.
Other considerations - Why use Isolation Level + Project validations instead of triggers?
Personally, I don't like triggers, I really have situations where triggers don't help. In my ORM framework, my business objects have many validations that must run over a "batch" of records. For example, I can only insert a record in table Master if there is at least one record in table Child. On the other hand, I can delete child records, but I can't leave a Master record without Child records. How do I solve that using triggers? Simple. I don't. Triggers and custom referential constraints run on a record-by-record basis. If I validate that a Master must have children during insert, it will always fail, as I must insert a Master before a Child. If I allow a Master to be inserted without a Child, nothing forbids me of committing the transaction without inserting a single child, right? That's why I use Committing validations. So, I need a business validation that is not in a real trigger, but it is a "trigger" that runs just before Commit. In this case, I allow a master record to be inserted and, before commit, I validate if it has children. Of course, this is not the only situation, and that's why I need to use Isolation Levels. For such triggers, the best Isolation Level is Serializable, as all my validations can fail if some data was inserted just after I verified that such data didn't exist (or, even worse, such data was deleted).
I hope this little article helps in building more robust applications that use Isolation Levels properly.