This article describes a simple and compact system of detecting changed records in a database table. The main focus of this solution is the detection of actually changed records. The article can serve as an inspiration for others who want to make more complex data version control systems such as auditing, history records accounting, etc.
Problem with actual changed records
Thinking about a general system of data version control (DVC) in a database system (in this case the MS SQL Server), several crucial questions must be asked:
- Do we want to monitor all records that are considered as changed, or only those, that have actually changed?
- How can we recognize an actual changed record?
- What is responsible for the recognition of an actual changed record (some table trigger, a special stored procedure, etc.)?
- How should we mark that a particular record has actually changed?
The MS SQL Server doesn't provide own instrument which helps to determine whether an affected record has actually changed or not. If an affected record gets included in the result set of an update statement, it is considered to be changed - despite the fact that all the data in the columns of the record are the same as before the update. Especially, if the ROWVERSION data type column is used, which might be useful in certain situations, a new value is stored there whenever an update statement gets executed, and there is no possibility for us to influence it.
There are several ways to tell whether a record has been changed or not. For example:
- By comparing the old and the new value of all changing columns in each SQL update statement. This is very uncomfortable and very difficult for code maintenance. But, if only one update statement per table is used, it could be considered an acceptable solution.
- By performing a similar comparison as in point 1, using the Table-Update Trigger.
- Computing a check sum value (using the
BINARY_CHECKSUM built-in functions which return a 32bit integer) for a row and comparing with it against the previous value. We can use an asterisk instead of a list of columns. But the check sum function is designed for the ' ... use in building hash indexes' (as you can see at msdn). A 32bit integer value is too small to avoid the collision. And there are some specifics of its implementation - e.g. null value usage, etc.
- A similar solution as in point nr. 3, this time using a built-in hash function:
HASHBYTES. The function returns
VARBINARY value and its length is determined by the selected hash algorithm. For SHA or SHA1 algorithm this is a 160 bit value. However, while not being entirely impossible, the collision is almost out of the question, though not impossible. The second disadvantage is that a list of columns (or an asterisk) cannot be used at the place of an input parameter. Instead a concatenation function, as e.g.
SELECT FOR XML statement, must be used.
A possible solution using a generic trigger
This solution of the issue is based on a common SQL CLR Trigger. Since there is only one code implementation for it (for all the tables), it can be called a generic trigger. The trigger identifies an actual changed row by comparing the old and the new values in each column.
Next, each DVC table has a special attribute that contains a row version number. The number is sequentially assigned an integer value (by an inbuilt database sequencer - on MS SQL Server 2012 or later), which is unique for each row in a given database, and the new value is generated when the row is actually changed (or inserted as new). The actually changed record is marked this way.
Our Data-Version Controlled Tables have these system attributes:
|System Column ||Data Type ||Description |
|Contains a unique value of the row version valid at the moment of row insertion. The column has a default value constraint - next value from the sequencer. |
|Contains the current value of row version - when the row is actually changed, a new unique value is stored here. |
|Indicates whether a row is valid or not (i.e. deleted). For example, the value of 0 means the row is valid whereas the value of 1 means the row has been deleted. |
|This is a primary key alias implemented as a computed column. It allows us to keep the original primary key column name in legacy tables and to use a common name in the generic trigger at the same time. If we don't have a one-column PK (or the PK is based on non integer values), we can create a new column as a candidate key (identity column) and declare it as a unique key. |
|Contains the table name as a computed column with a constant value for each row. The value is used in the generic trigger. Without it the trigger would not recognize which table is currently being handled and there is would be no other reliable way of finding out. A computed-column constant value is not stored in a table data page and shouldn't be too resource demanding. |
When a record gets inserted into the DVC table, the
SysInsertRowVersion column contains a new unique value provided by the sequencer. The value is enforced by the default constraint declared on the table. For example, if you store a current value from a previous processing task, we can find out whether the newly inserted record comes earlier than the time at which the task was finished.
If the record has been updated and some of its column values have actually changed, the
SysUpdateRowVersion column will show a new unique value given by the sequencer, which is different from the value in the
SysInsertRowVersion column. This value is assigned by the generic update trigger.
In this case an update statement is used instead of the delete statement for deleting a row. The update statement assigns the
SysRowState column a value which means deleted record. To make DVC tables query easier, you can create a simple view of each DVC table excluding the deleted rows can be created.
Using the code
The following part describes the solution at the example of simple case of a single table. The problem can be divided into a SQL Server part and a SQL CLR part (the generic trigger implementation).
SQL Server part
First, a sequencer providing a unique long-integer (
BIGINT) incremented values is needed. In this example, it is marked
seq_SysDVC and its DDL code is as follows:
CREATE SEQUENCE dbo.seq_SysDVC AS BIGINT
START WITH 1
INCREMENT BY 1
Second, the DVC table in this example is marked
TbCustomer with the following DDL code:
CREATE TABLE dbo.TbCustomer(
CustomerId INTEGER NOT NULL IDENTITY(1,1),
FirstName VARCHAR(100) NOT NULL,
LastName VARCHAR(100) NOT NULL,
Birthday DATETIME NOT NULL,
Rating DECIMAL NOT NULL,
SysInsertRowVersion BIGINT NOT NULL CONSTRAINT dfc_dbo_TbCustomer_SysInsertRowVersion DEFAULT NEXT VALUE FOR dbo.seq_SysDVC,
SysUpdateRowVersion BIGINT NOT NULL CONSTRAINT dfc_dbo_TbCustomer_SysUpdateRowVersion DEFAULT -1,
SysRowState TINYINT NOT NULL CONSTRAINT dfc_dbo_TbCustomer_SysRowState DEFAULT 0,
SysRowId AS (CONVERT(BIGINT, CustomerId)),
SysTableName AS ('dbo.TbCustomer'),
CONSTRAINT pkc_TbCustomer PRIMARY KEY CLUSTERED (CustomerId ASC)
On the SQL Server side a common table data type consisting of one attribute:
SysRowId is necessary to provide a batch updating statement of the generic trigger. Its DDL code is here:
CREATE TYPE dbo.udt_DVCChangeResultTable AS TABLE(
SysRowId BIGINT NOT NULL PRIMARY KEY CLUSTERED
SQL CLR part
On the SQL CLR side, the generic trigger has a simple assembly project with one public class (
CDVCTrigger) containing a single public static method (
OnUpdateBaseTrig) that implements its logic. The project contains some further helper classes and structures with evident purposes.
Should an old value be compared with a new one for a given column in a particular record, two trigger pseudo tables have to be used (Inserted and Deleted). However, two simultaneously opened
SqlDataReader objects cause problems with the Multiple Active Result Sets related to the context SQL Server connection used in the trigger, we cannot use them. To avoid the possible growth in resource allocation
DataTable objects should not be used. Finally the use of a separate command per each changed row slows down the trigger process. For this reason, I use my own common structures for in-memory data storage.
For further information, see the Visual Studio project, which provides detailed explanation in the source code.
A compiled binary file of the SQL CLR project: DVC.SQLCLR.dll is created. It can be deployed on the SQL Server by means of:
EXEC sp_configure 'clr enabled', 1
IF EXISTS (SELECT * FROM sys.assemblies asms WHERE asms.name = N'DVC.SQLCLR')
DROP ASSEMBLY [DVC.SQLCLR]
CREATE ASSEMBLY [DVC.SQLCLR] FROM 'C:\Temp\DVC.SQLCLR.dll' WITH PERMISSION_SET = SAFE
- Copy the binary file of the assembly into a server side file system, for example into its local path 'c:\Temp\'.
- Open the SQL Server Management Studio, switch the context database to the target one and open a new query window.
- In the query window run the 'installation' code:
The following code shows how to simply 'declare' a new update trigger (based on the generic trigger) on the table from the example above:
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = object_id(N'dbo.tri_TbCustomer_Upd') AND OBJECTPROPERTY(id, N'IsTrigger') = 1)
DROP TRIGGER dbo.tri_TbCustomer_Upd
CREATE TRIGGER dbo.tri_TbCustomer_Upd ON dbo.TbCustomer FOR UPDATE AS EXTERNAL NAME [DVC.SQLCLR].[DVC.SQLCLR.CDVCTrigger].OnUpdateBaseTrig
Now that the task is finished, it is time to test its results.
Testing of the DVC system
The example table
TbCustomer is currently empty and new data must be inserted. And the current value of the DVC sequencer can be obtained in the following way:
SELECT CONVERT(BIGINT, (SELECT TOP 1 T.Current_Value FROM SYS.Sequences AS T WHERE T.Name = 'seq_SysDVC'))
dbo.TbCustomer(FirstName, LastName, Birthday, Rating)
('Graham', 'Greene', '1904-10-02', 50),
('Karel', 'Capek', '1890-01-09', 150),
('Josef', 'Skvorecky', '1924-09-27', 250)
SELECT * FROM dbo.TbCustomer
As shown below, the new records have valid values in
SysInsertRowVersion column. The new value has been assigned here by the column default constraint:
The following update statement activates the generic trigger. The trigger recognizes an actually changed record and inserts new value in the
T.Rating = CASE WHEN T.CustomerId = 2 THEN (T.Rating * 2) ELSE T.Rating END
dbo.TbCustomer AS T
SELECT * FROM dbo.TbCustomer
Although the SQL Server returns a
'(3 row(s) affected)' message, there is only one row, which has actually changed:
As seen in the
SysUpdateRowVersion column, the actually changed record is marked by a change in the number, while the unchanged records remain unmarked, thus confirming the ability of this solution to differentiate between updated and actually changed data.
Points of Interest
Using a computed column alias instead of the original primary key column
In case of doubt whether a query using the
SysRowId column primary-key alias in its
WHERE clause is effective enough, the following two execution plans produced by two sample queries can be compared: The first query uses the native primary key column while the second query uses the alias column. The execution plans are not identical:
Detecting a parent table object of SQL CLR trigger current context
There are several 'obscure' ways how to obtain the name of a parent table for which the trigger has been fired. However, in fact, there is no attribute of the
SqlTriggerContext where this could be found. For more details, see the following discussion threads:
- 2016-10-04: First version.