The Scenario
A Line-Of-Business app, previously using SQL Server 2005, that manages patients’ medical bills documents as well as supporting and output documents related to the processing of those bills. Once a document is generated, no modifications are allowed, and only in rare cases is a delete allowed. All access to these documents must be controlled, including reads. In ancient times (7 years ago), when requirements were less stringent, these files were on a company share that every user had full control of. The (easily imagined) ensuing disaster bred the interim solution which had the files stored in a table in the database. All access went through the LOB app. Reasonable care in the design meant that application performance was decent, and things have gone a long time without complaint. Of course, having more than a million files (records) occupying more than a terabyte of space in that table has a cost, the database verify operation takes 12+ hours to complete. I still want the behavior of the transaction when inserting a new file, but after that transaction completes, I want “filesystem like” behavior, with the extra bit that the reading of a file is logged.
The Solution
I looked at SQL Server 2008’s File Stream capabilities, and was enticed, but not enough to make that jump. When word of what SQL Server 2012 was bringing to the table came out, I was glad I didn’t go with the earlier version. SQL Server 2012’s File Table felt like a great fit. I could include the bytes of the file in an INSERT transaction, get rollback functionality without writing a compensating routine, and the file would end up on a filesystem. I would naturally use the company’s SAN for the filesystem. It has “snapshot” functionality which it integrates with SQL Server’s backup, block level de-duplication which saves space, and block level mirroring to the DR site, which saves bacon. To achieve logging of file reads, I would introduce a web service that would, upon receiving an id, translate the id to a path, perform access control checks, log the read, and return a stream of bytes to the caller. I could scale out this web service if the load on a single machine was too much to handle.
The “Gotcha”
On my development box, I begin to configure the database to handle a FileTable, which was pretty straightforward. Next, I learn about the hierarchyid
datatype, and how to manipulate it, because that’s what I need to do to modify the [path_locator] column of the table. I’d read about hierarchyid
before, but this was my first exposure to them. They’re not particularly difficult to understand or use, and I was soon modifying them with the desired result that the file whose record I was modifying was moving from directory to directory on the filesystem. Next step is to move the code I’d been working on into an AFTER INSERT
trigger on the FileTable. If you know more about FileTables than I did, you’ll already know that you cannot modify a FileTable record in a trigger fired on a FileTable. Gaah! I don’t want to push responsibility for path management towards the client, I want it to stay right were I’d designed it! What I need is a reliable mechanism that ends up with a stored procedure executing, which will move new FileTable entries to the correct path.
The Resolution
I had played with SQL Server’s Service Broker (SSB) in the past, as a learning exercise, and thought that now I had a good use for it. I decide to put an AFTER INSERT trigger on the FileTable, and have that trigger queue an SSB message. The convenient thing about placing a message in a queue in the trigger is that if, for any reason, the original transaction rolls back, the queued message will also be rolled back. The contents of the message will be the [stream_id] of the files inserted. If multiple files are inserted as part of a single transaction, I’ll try to put all their ids in a single message, up to what I arbitrarily decide is a reasonable limit for message size. I decide 28 ids will result in a not-too-large message, just under 4KB, and we’ve never inserted 28 files as part of a single production transaction. The limit on ids presented another opportunity to use a new feature, the OFFSET/FETCH
clauses in a SELECT statement. In the end, this is what my trigger looks like:
CREATE TRIGGER [trgInsMyFileTable]
ON [myFileTable]
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @dialog uniqueidentifier,
@msg xml,
@insertCount int,
@counter int = 0,
@pageSize int = 28;
SELECT @insertCount = COUNT(*)
FROM [inserted];
WHILE (@counter < @insertCount)
BEGIN
SET @msg = (SELECT stream_id
FROM [inserted]
ORDER BY stream_id ASC
OFFSET @counter ROWS
FETCH NEXT @pageSize ROWS ONLY
FOR XML RAW, ROOT(N'rows'), ELEMENTS);
SET @counter = @counter + @pageSize;
BEGIN DIALOG CONVERSATION @dialog
FROM SERVICE [NewStoredFileSendService]
TO SERVICE N'NewStoredFileReceiveService'
ON CONTRACT [NewStoredFileContract]
WITH ENCRYPTION = OFF;
SEND ON CONVERSATION @dialog MESSAGE TYPE [NewStoredFileRequest](@msg);
END
END
If you’re unfamiliar with SSB, the BEGIN DIALOG
and SEND ON
statements will result in the message being reliably queued for processing. Processing, in this case, means that a stored procedure will be run. That procedure is supposed to dequeue the message and process it. It’s got a bunch of error handling code in it, but the meat of it is pretty simple. Here’s the full procedure:
CREATE PROCEDURE [dbo].[spProcessNewStoredFileRequest]
AS
BEGIN
DECLARE @messageBody varbinary(MAX),
@messageType sysname,
@dialog uniqueidentifier;
WHILE (1 = 1)
BEGIN
BEGIN TRAN;
BEGIN TRY;
WAITFOR
(
RECEIVE TOP (1)
@messageType=message_type_name,
@messageBody=message_body,
@dialog=conversation_handle
FROM [dbo].[NewStoredFileReceiveQueue]
), TIMEOUT 1000;
IF (@@ROWCOUNT=0)
BEGIN
IF (@@TRANCOUNT>0)
BEGIN
COMMIT TRAN;
END
BREAK;
END
IF (@messageType=N'NewStoredFileRequest')
BEGIN
DECLARE @messageXml xml = CONVERT(xml, @messageBody),
@streamIds [dbo].[ListOfStreamIds];
INSERT @streamIds (stream_id)
SELECT T.c.value(N'.[1]', N'uniqueidentifier')
FROM @messageXml.nodes(N'/rows/row/stream_id') T(c);
EXEC dbo.spStoredFileTableDataManageDirectories @streamIds=@streamIds;
SEND ON CONVERSATION @dialog MESSAGE TYPE [NewStoredFileResponse];
END CONVERSATION @dialog;
END
ELSE IF (@messageType=N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
BEGIN
END CONVERSATION @dialog;
END
ELSE IF (@messageType=N'http://schemas.microsoft.com/SQL/ServiceBroker/Error')
BEGIN
DECLARE @dialog_string nvarchar(100) = CONVERT(nvarchar(64), @dialog),
@error_message nvarchar(4000)= CONVERT(nvarchar(4000), @messageBody);
RAISERROR (N'Conversation %s was ended with error %s',
10, 1, @dialog_string, @error_message) WITH LOG;
END CONVERSATION @dialog;
END
COMMIT TRAN;
END TRY
BEGIN CATCH
SELECT ERROR_NUMBER() [ErrorNumber],
ERROR_SEVERITY() [ErrorSeverity],
ERROR_STATE() [ErrorState],
ERROR_PROCEDURE() [ErrorProcedure],
ERROR_LINE() [ErrorLine],
ERROR_MESSAGE() [ErrorMessage];
DECLARE @ErrNum int = ERROR_NUMBER(),
@ErrMsg nvarchar(4000) = ERROR_MESSAGE();
IF (@dialog IS NOT NULL)
BEGIN
END CONVERSATION @dialog WITH ERROR = @ErrNum DESCRIPTION = @ErrMsg;
END
IF (@@TRANCOUNT>0)
BEGIN
ROLLBACK TRAN;
END
END CATCH;
END
END
The part that begins “IF (@messageType=N'//companyname.com/projectname/NewStoredFileRequest')
” is where the real work is done. It takes the incoming message, an example of which is shown, and fills a table-valued parameter with the IDs, which is passed to the stored procedure. Table-valued parameters, there’s another new feature used! When the stored procedure returns, a response is sent that acknowledges the incoming, and this end of the conversation is torn down. The last stored procedure to document is the one that directly manages the directories. Here’s the code:
CREATE PROCEDURE [dbo].[spStoredFileTableDataManageDirectories]
@streamIds dbo.ListOfStreamIds READONLY
AS
BEGIN
SET NOCOUNT ON;
DECLARE @destsByYearMonth TABLE
([Year] int
,[Month] int
,YearName AS (Format([Year], N'0000'))
,[MonthName] AS (Format([Month], N'00'))
,YearFullName nvarchar(4000)
,MonthFullName nvarchar(4000)
,AddYear bit DEFAULT(0)
,AddMonth bit DEFAULT(0)
,YearPath hierarchyid
,MonthPath hierarchyid
,PRIMARY KEY([Year], [Month]));
INSERT @destsByYearMonth ([Year], [Month])
SELECT DISTINCT Year(creation_time), Month(creation_time)
FROM @streamIds tvp
JOIN [myFileTable] ft ON tvp.stream_id=ft.stream_id;
UPDATE @destsByYearMonth
SET YearFullName = FileTableRootPath(N'myFileTable') + N'\' + YearName;
UPDATE @destsByYearMonth
SET MonthFullName = YearFullName + N'\' + MonthName;
UPDATE @destsByYearMonth
SET YearPath = GetPathLocator(YearFullName)
FROM [myFileTable];
IF (EXISTS(SELECT TOP (1) 1 FROM @destsByYearMonth WHERE YearPath IS NULL))
BEGIN
UPDATE @destsByYearMonth
SET AddYear = 1,
YearPath = dbo.fnGetNewPathLocator(newid(), hierarchyid::GetRoot())
WHERE YearPath IS NULL;
INSERT [myFileTable](name, path_locator, is_directory)
SELECT YearName, YearPath, 1
FROM @destsByYearMonth
WHERE AddYear = 1;
END
UPDATE @destsByYearMonth
SET MonthPath = GetPathLocator(MonthFullName)
FROM [myFileTable];
IF (EXISTS(SELECT TOP (1) 1 FROM @destsByYearMonth WHERE MonthPath IS NULL))
BEGIN
UPDATE @destsByYearMonth
SET AddMonth = 1,
MonthPath = dbo.fnGetNewPathLocator(newid(), YearPath)
WHERE MonthPath IS NULL;
INSERT [myFileTable](name, path_locator, is_directory)
SELECT [MonthName], MonthPath, 1
FROM @destsByYearMonth
WHERE AddMonth = 1;
END
UPDATE d
SET path_locator=d.path_locator.GetReparentedValue(d.parent_path_locator, i.MonthPath)
FROM @streamIds s
JOIN [myFileTable] d ON s.stream_id=d.stream_id
JOIN @destsByYearMonth i ON Year(creation_time)=i.[Year] AND Month(creation_time)=i.[Month];
END
This procedure works its way from the Year and Month of the [creation_time] column of specified records, creating records that represent directories as necessary, to finally modifying the added files. It uses a function, dbo.fnGetNewPathLocator()
, that I found here. I’ve modified to be more what I need, here’s the code:
CREATE FUNCTION [dbo].[fnGetNewPathLocator]
(@child uniqueidentifier
,@parent hierarchyid = NULL)
RETURNS hierarchyid
AS
BEGIN
DECLARE @result hierarchyid,
@binId binary(16) = CONVERT(binary(16), @child);
SELECT @result = hierarchyid::Parse
(
COALESCE(@parent.ToString(), N'/') +
CONVERT(nvarchar, CONVERT(bigint, SUBSTRING(@binId, 1, 6))) + N'.' +
CONVERT(nvarchar, CONVERT(bigint, SUBSTRING(@binId, 7, 6))) + N'.' +
CONVERT(nvarchar, CONVERT(bigint, SUBSTRING(@binId, 13, 4))) + N'/'
);
RETURN @result;
END
The changes that I made are that, instead of using some trick to call NEWID()
in a function, I have the caller pass it in, and the datatype I return is a hierarchyid
. I like this method of generating hierarchyid
s because it uses unique identifiers, similar to those the internal method FileTable
s use.
Conclusion
This exercise resulted in the study and use of several features of SQL Server 2012, some new, and some that had been around but I’d never had a reason to use. I haven’t tested this for performance, but I have no doubt it will behave well. I intend to measure the performance as I move the existing files out of their current storage and into this FileTable
.