(untagged)

Record Versioning with SQL Server

Ben Allfree

0.00/5 (No votes)

15 Feb 2007

Learn how to store data with a version history

Download demo project and source - 17 Kb

Introduction

Maintaining a version history of SQL Server data has many benefits, but the top three are:

Undo history
Auditing
Soft delete

Record versioning imposes a layer on top of CRUD operations that make them more complex, especially on Update and Delete. No longer can you simply delete a record. Instead, you must flag it as deleted ("soft delete"). No longer can you simply update a record; instead, you must perform a soft delete followed by an insert.

Reporting is also a challenge. Most reporting frameworks do not understand the concept of versioned data. They expect each record to be a distinct data item, not a 'version' of a data item it has already seen.

No versioning solution will circumvent the fundamental challenges to versioned records, but we can greatly improve on the traditional approach to auditing data.

The typical solution

Record versioning is normally accomplished by creating separate audit tables that mirror the schema of the main tables.

Does that look right to you? It has a few bad smells to me.

I don't like the schema duplication. It adds to the clutter of tables, makes maintenance more difficult, and in general makes it harder for new developers to digest. I think we can do better.
It feels unsafe. The fundamental principal of moving data involves deleting from the old destination. I think we can do better.
It's not friendly with ORM. If you use ORM tools or handle your audit trail with business objects, you are forced to copy each field explicitly, from the old business object to the new 'audit' business object. Again, I think we can do better.
It doesn't scale. If I want to add a BlogComment table, I have to add another audit table. Now not only do we have schema duplication, but we have duplicate abstractions of auditing that can grow apart over time. Imagine years later when the BlogComment_Archive table behaves differently than the Blog_Archive table. That makes it even harder to comprehend the schema.
I don't want to write reports against this schema. I have to point to separate tables when I want historical drill-down and that seems unnecessary.

The better plan

As it turns out, we indeed can do much, much better! By using some clever entity inheritance, we can solve the audit problem for all tables in the database instead of just one at a time.

The solution involves a base audit table from which all auditable tables inherit. Study the following diagram and then look at the notes below.

The Audit table contains all the version information. Date stamp, active state, who updated it. Your audit requirements may include other fields here.
The Audit table contains a PermanentRecordId. Because the PK of Blog records will change as you insert new versions, we need a permanent ID to identify a single blog entry and group versions of the same blog entry.
The Blog table shares a 1:1 relationship on primary keys. Audit.Id is the PK and Blog.Id is the FK. Notice that Blog.Id is not an identity column.
The Blog table looks much cleaner without all the audit garbage distracting. When you look at the Blog table, you immediately understand its purpose. Likewise, the Audit table is immediately obvious.

CRUD Operations

Let's try out a few CRUD operations to see how this new approach feels.

Create (Insert) and Read (Select)

Entity inheritance generally requires two more more insertions because you must insert into multiple tables for one complete 'record'.

In this case, insertion now involves two operations. First, you must insert a record into the base Audit table. Then, you must get the PK ID of that inserted record for use with the second insertion into the Blog table.

Multiple insertions per operation is one drawback to the entity inheritance strategy, but it can be encapsulated. For example, the following insertion sample could be converted into a stored procedure that takes the Blog table values and the value for Audit.Updated_By.

/* Setup */
delete from blog
delete from comment
delete from audit

declare @id bigint
/* ------- */

/* Blog insert */
insert into Audit(Updated_By) values ('Ben')
select @id = @@identity
insert into Blog (Id, UserId, Title, Body) values ( @id, 1, 'My first entry', 'My blog body')
select * from Blog join Audit on Blog.Id = Audit.Id where IsActive=1

As you can see, the Audit table kicked right in and did its job. We have a PermanentRecordId for this blog entry, and all other information is intact. Notice also that the Id columns are synchronized, and the record is marked as active.

While it's true that you must perform an INNER JOIN with a WHERE to select anything using this strategy, the operation is not costly because it is performed on indexed fields.

Update

In versioned recording, an Update is really a soft delete followed by an insert.

Soft deletes are performed directly against the audit table. This is one nice feature: to perform a soft delete, you don't even need to know the record type. This means you could make a single stored procedure, spSoftDelete(id) that accepts the ID of the record to soft delete.

/* Blog update */
declare @BlogRecordId uniqueidentifier
select @BlogRecordId=PermanentRecordId from Audit where Id = @id
update Audit set IsActive=0 where Id=@id
insert into Audit(Updated_By, PermanentRecordId) values ('Ben', @BlogRecordId)
insert into Blog (Id, UserId, Title, Body) 
   values ( @@identity, 1, 'My updated first entry', 'My blog body is different now')
select * from Blog join Audit on Blog.Id = Audit.Id

Now we have two entries with the same PermanentRecordId. That means they are different versions of the same logical record. But notice that the old record is no longer active. We could select MAX(Created), but an active flag is faster, and as you will see for Undo operations, necessary.

Delete

Deletes retire all versions of a record. This can be performed on the Audit table alone, making it easy to encapsulate.

/* Blog delete */
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
select * from Blog join Audit on Blog.Id = Audit.Id

Alternately, if you have the exact Blog.Id or Audit.Id, you can set IsActive that way. As long as all your Update operations are done correctly, there should be only one record where IsActive=1. You could even set a constraint to check this.

CRUD Extensions

Versioning opens the possibility of extended operations, such as undo/redo. To use these CRUD extensions, add two more versions for a total of 4 blog entries.

/* Setup for CRUD extensions */
insert into Audit(Updated_By, PermanentRecordId) values ('Ben', @BlogRecordId)
insert into Blog (Id, UserId, Title, Body) 
   values ( @@identity, 1, 'My 3rd updated first entry', 'My blog body is different now')
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
insert into Audit(Updated_By, PermanentRecordId) values ('Ben', @BlogRecordId)
insert into Blog (Id, UserId, Title, Body) 
   values ( @@identity, 1, 'My 4th updated first entry', 'My blog body is different now')
/* ------------------------- */

Undo/Rollback/Revert

An undo operation involves a few steps:

Find the version directly preceding the active version, if there is one.
Soft-delete the active version
Activate the version you found in step 1

In the sample below, we find the version preceding the active version by relying on primary keys for chronological order. Version #3 will always have a PK ID smaller than version #4 of the same record. Some programmers do not like using indexed primary keys to determine chronological order. If you are among these, you may want to consider using an alternate indexed column to maintain chronological order. (Note: the SQL Server 'timestamp' data type will not work because records are updated when their Active status changes, and this changes the timestamp value).

/* Rollback/undo to previous (to 3rd) */
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
    and Id < (select Id from Audit where PermanentRecordId=@BlogRecordId and IsActive=1)  order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id

/* Rollback/undo to previous (to 2nd) */
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
    and Id < (select Id from Audit where PermanentRecordId=@BlogRecordId and IsActive=1)  order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id

/* Rollback/undo to previous (to 1st) */
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
    and Id < (select Id from Audit where PermanentRecordId=@BlogRecordId and IsActive=1)  order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id

/* Rollback/undo to previous (still on 1st) */
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
    and Id < (select Id from Audit where PermanentRecordId=@BlogRecordId and IsActive=1)  order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id

Redo

Redo is a simple reversal of the undo operation. Find the first version following the active version, and activate it.

/* Redo (to 2nd) */
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
     and Id > (select Id from Audit where PermanentRecordId=@BlogRecordId and IsActive=1)  order by Id asc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id

Adding new auditable tables

Adding new tables under our version control system is easy, and the CRUD code you already saw above is easily adapted, especially if you choose to encapsulate in stored procedures.

Let's add a table to support versionable, nested blog comments to demonstrate how similar the CRUD is for a new table. Also, notice that the CRUD demonstrated above requires zero changes when a new table is added. Outstanding.

At first, supporting multiple records from multiple tables sounds impossibly difficult, but it works with almost no added effort.

Comment.Id is an FK to Audit.Id just like Blog.Id
Comment.PermanentBlogId will store the PermanentId for the blog entry. You could use an FK directly to Blog.Id, but the downside is that you would have to update all FK references when the active version of the Blog record changes. This way, you give up a little referential integrity (that you could add back with constraints if you wanted to), but you gain simplicity through decoupled revision changes.
Comment.ParentId points to Comment.Id to allow for nested comments

So this is getting interesting! Now, comments can have versions just like blog entries, but nothing is ever lost. Let's check our CRUD again to be sure everything works.

Create (Insert)

The only difference here is that we need to reference the PermanentBlogId. That is a requirement only because Comments are owned by Blogs. Fortunately, we have that from earlier in the demo script. If you had the Blog.Id, you could use that to get the PermanentId of the Blog entry.

/* Comment insert */
insert into Audit(Updated_By) values ('Commenter')
select @id = @@identity
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment) 
   values ( @id, @BlogRecordId, 2, 'My first comment')
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1

If you study this, you will see that this comment version has a different PermanentRecordId. That is correct because this is a different set of versioned data. Each comment will get its own PermanentId.

Update

Update is identical excepting different values for the insertion.

/* Comment update */
declare @CommentRecordId uniqueidentifier
select @CommentRecordId=PermanentRecordId from Audit where Id = @id
update Audit set IsActive=0 where Id=@id
insert into Audit(Updated_By, PermanentRecordId) values ('Commenter', @CommentRecordId)
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment) 
   values ( @@identity, @BlogRecordId, 1, 'My updated first comment')
select * from Comment join Audit on Comment.Id = Audit.Id

Select all comments for a blog entry

Okay, so you're convinced now that the versioning works. You can see that the rollback and restore operations will be performed on the PermanentId values associated with each comment, and you can see each comment gets its own PermanentId. It's confusing to imagine that both Blog entries and Comments have versions!

So, how do you select all the comments for the current version of the blog? Easy.

/* Insert 2nd comment */
insert into Audit(Updated_By) values ('Commenter #2')
select @id = @@identity
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment) 
   values ( @id, @BlogRecordId, 2, 'I have a comment too')
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1

/* Insert 3rd nested comment */
declare @parentId bigint
select @parentId = @id
insert into Audit(Updated_By) values ('Commenter #3')
select @id = @@identity
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment, ParentId) 
   values ( @id, @BlogRecordId, 3, 'I have a comment about the comment', @parentId)
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1

/* Select all comments for this blog entry */
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1 and
   PermanentBlogId = @BlogRecordId order by Created asc

Further study

If you are interested in this approach, I recommend looking into the following advanced topics:

Using record versioning with your favorite ORM tool
Using record versioning with code generated DALs
Entity/table inheritance
Hierarchical versions (for example, if you wanted a Blog rollback to also roll Comments back)
Writing views for reports
Encapsulating and abstracting insert/update operations

About Benjamin Allfree

Benjamin Allfree runs Launchpoint Software Inc., a technology-agnostic company specializing in unparalleled custom software solutions. Benjamin spends most of his time researching architecture and new technology.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here