Click here to Skip to main content
Click here to Skip to main content
Go to top

Basic SQL Auto-Purging for Document Management

, 25 Jul 2012
Rate this:
Please Sign up or sign in to vote.
This article will help manage automatic database purges according to basic Document Management principles.

Introduction

Document Management is something that all businesses, small to large, must deal with at some point. But Document Management covers more ground than just the documents that a business will generate. Whenever a business stores information within an SQL server, IT must provide a way for that business to maintain strict adherence to its Document Management procedures. Notice I said IT must provide the way, not create the rule. What is the difference?  IT usually does not own the data!

Background  

In the past dba's have used a variety of ways to purge data from their SQL servers but in many cases the dba's were simply trying to keep database size manageable. In my freelance work I often see data that is so old it no longer makes sense to keep it. In my daily job working for a fortune 500 company this is more than just undesirable, it is unacceptable. Eventually data no longer makes sense to the users and can actually hurt a business if it is kept. 

But how should we purge the data?  If you have a small SQL server with maybe just a dozen or so databases, creating SQL Jobs by database may make sense. But what if the data owners need different purges by table?   Now you are creating a lot of SQL jobs that just clutter your environment and take a lot of maintenance when the data owner wants to change the purge. And what about when a new database or table gets created?  Often it is done during coding or on the fly with all the best intentions to come back and set up the purge later. 

Using the code  

This is very simple code and can be expanded far past the very basic form in which I am presenting it. I wanted to keep this simple so that each unique environment can do what they want. Maybe you want to add a front end webpage so that data owners can change their own purges or you want to add some extra fields etc.

Step one is the need to create two new tables to store the management data: 

(Notice that I created a new database "DataPurge" First, I like separation in my servers)

USE [DataPurge]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[DataPurgeHistory](
	[DBName] [varchar](50) NOT NULL,
	[TableName] [varchar](50) NOT NULL,
	[DateStamp] [datetime] NULL,
	[PurgeNeeded] [bit] NOT NULL,
	[NewTable] [bit] NOT NULL,
	[YearsToKeep] [int] NULL,
	[PurgeField] [varchar](50) NULL,
 CONSTRAINT [PK_DataPurgeHistory] PRIMARY KEY CLUSTERED 
(
	[DBName] ASC,
	[TableName] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
       ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[DataPurgeHistory] ADD CONSTRAINT [DF_DataPurgeHistory_PurgeNeeded]  DEFAULT ((0)) FOR [PurgeNeeded]
GO

ALTER TABLE [dbo].[DataPurgeHistory] ADD CONSTRAINT [DF_DataPurgeHistory_NewTable]  DEFAULT ((1)) FOR [NewTable]
GO  

This first table is going to be used to actually store the purge data. The datestamp is just used to let us know the last time the table was purged. PurgeNeeded can be set to 0 so that the job will not purge the selected table. NewTable is used to let us know what tables need a purge selection set up. YearsToKeep is a very simple year offset to delete data based on the PurgeField. 

USE [DataPurge]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[DBTables](
	[DBName] [varchar](50) NULL,
	[TableName] [varchar](50) NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO 

This table just keeps a running list of all the tables within this server.

Before we get to the actual jobs, we must also set up a view within the same database as the two tables we just created, it will be used to help us add new tables to the purge list:

USE [DataPurge]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE VIEW [dbo].[vwNewDBTable]
AS
SELECT     TOP (100) PERCENT dbo.DBTables.DBName, dbo.DBTables.TableName, dbo.DataPurgeHistory.DBName AS DBName2, 
                      dbo.DataPurgeHistory.TableName AS TableName2
FROM         dbo.DBTables LEFT OUTER JOIN
                      dbo.DataPurgeHistory ON dbo.DataPurgeHistory.DBName = dbo.DBTables.DBName 
                      AND dbo.DBTables.TableName = dbo.DataPurgeHistory.TableName
WHERE     (dbo.DataPurgeHistory.DBName IS NULL)
ORDER BY dbo.DBTables.DBName, dbo.DBTables.TableName

GO

Now that we have everything set up, we need to create two jobs. The First Job will check for any new databases and/or tables that have been created. I set this up to run nightly but you can create whatever schedule is best for your needs. It is a simple job with a single T-SQL step:

SET NOCOUNT ON 

Delete from DataPurge.dbo.DBTables

DECLARE @AllTables table (DbName sysname,SchemaName sysname, TableName sysname) 
DECLARE 
     @SearchDb nvarchar(200) 
    ,@SearchSchema nvarchar(200) 
    ,@SearchTable nvarchar(200) 
    ,@SQL nvarchar(4000) 
SET @SearchDb=''%'' 
SET @SearchSchema=''%'' 
SET @SearchTable=''%Account%'' 
SET @SQL=''select ''''?'''' as DbName, t.name as TableName from [?].sys.tables t inner 
    join sys.schemas s on t.schema_id=s.schema_id WHERE ''''?'''' not in 
    (''''master'''',''''tempdb'''',''''model'''',''''msdb'''') AND s.name LIKE ''''''+@SearchSchema+''''''''
 
INSERT INTO DataPurge.dbo.DBTables (DbName, TableName) 
    EXEC sp_msforeachdb @SQL 
    
Insert into DataPurge.dbo.DataPurgeHistory (DBName, TableName)
	Select DBNAme,TableName from DataPurge.dbo.vwNewDBTable

The next Job will actually perform the purging based on the settings in the DataPurgeHistory table we created earlier. Again, I set this to run nightly but depending on your environment you can run it on whatever schedule makes sense. This job also only has one step.:

SET NOCOUNT ON 

DECLARE 
     @DbName nvarchar(200) 
    ,@Schema nvarchar(200) 
    ,@TableName nvarchar(200) 
    ,@PurgeField nvarchar(200) 
    ,@YearsToKeep nvarchar(200) 
    ,@SQL nvarchar(4000) 

Declare c Cursor For Select DBName,TableName,PurgeField,YearsToKeep from 
   DataPurge.dbo.DataPurgeHistory where PurgeNeeded = 1 and YearsToKeep is not null and PurgeField is not null
Open c 
 
Fetch next From c into @DBName,@TableName,@PurgeField,@YearsToKeep;
 Set @Schema=''dbo'';
 
While @@Fetch_Status=0 Begin 
      
SET @SQL=''Delete From '' + @DbName + ''.'' + @Schema + ''.'' + @TableName + '' where '' + 
          @PurgeField + '' < dateadd(year, -'' + @YearsToKeep + '', getdate());
			Update DataPurge.dbo.DataPurgeHistory set DateStamp = getdate() where 
			DBName = '''''' + @DBNAme + '''''' and TableName ='''''' + @TableName +'''''';''
 
     EXEC (@SQL )
   
   Fetch next From c into @DBName,@TableName,@PurgeField,@YearsToKeep; 
End 
 
Close c 
Deallocate c

Yes, you could set these jobs up as a single job with two steps but I kept them separate so that eventually we could run them on separate schedules.

 So how does this work?  Well, the first job runs and gets all the new databases and tables. This allows you to assign a purgefield from that database and table. Next you put in a number of years you want to keep. Lastly you will want to change the PurgeNeeded to a 1 and the NewTable to a 0. The way this job works, it will only purge on rows that have all the needed information and PurgeNeeded set to 1.  

 This allows you to put holds on information if you need it to not purge due to legal actions etc. It also allows you to simply not purge tables that hold static data (such as information for a webpage dropdown). 

 When your job runs, it will go through the DataPurgeHistory table line by line and Purge the data as requested.  

Points of Interest 

This code is not crazy or difficult to reproduce/change. There are many things you or your company should change depending on your environment. A few examples:

  • Some of our tables are not defaulted to dbo, for servers containing those tables we added the Schema to part of the Table Checker.
  • We have added some description fields to allow Data Owners to comment
  • We have front end webpages that allow Data Owners to select their purges based on predetermined Document Management Guidelines.
  • The webpages have a dropdown that include all of the possible datetime fields that can be used as a purge field inside of the selected table.
  • Some of our Tables needed a purge not based on years so we added a field to tell us what the "ToKeep" columns was based on (ie days weeks months or years)

There are many ways to expand upon this very basic purge framework.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Dan Steuer
Systems Engineer
United States United States
I work mostly with c#, vb.net, asp.net, VB6, coldfusion, MSSQL, Oracle with a bit of Java thrown in when needed.
 
"Have you tried turning it off and on again?"
Follow on   Google+

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web03 | 2.8.140926.1 | Last Updated 25 Jul 2012
Article Copyright 2012 by Dan Steuer
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid