SQL Server 2012 FileTable: My first experience

Dan Kearney

4.73/5 (7 votes)

Apr 29, 2013

CPOL

5 min read

44452

One of the things that drew me to SQL Server 2012 was the FileTable feature. It seemed to be a great fit for my need, and to relatively easy to implement. Here’s my scenario, the solution I chose, the “gotchas” I ran into, and how I dealt with them.

The Scenario

A Line-Of-Business app, previously using SQL Server 2005, that manages patients’ medical bills documents as well as supporting and output documents related to the processing of those bills. Once a document is generated, no modifications are allowed, and only in rare cases is a delete allowed. All access to these documents must be controlled, including reads. In ancient times (7 years ago), when requirements were less stringent, these files were on a company share that every user had full control of. The (easily imagined) ensuing disaster bred the interim solution which had the files stored in a table in the database. All access went through the LOB app. Reasonable care in the design meant that application performance was decent, and things have gone a long time without complaint. Of course, having more than a million files (records) occupying more than a terabyte of space in that table has a cost, the database verify operation takes 12+ hours to complete. I still want the behavior of the transaction when inserting a new file, but after that transaction completes, I want “filesystem like” behavior, with the extra bit that the reading of a file is logged.

The Solution

I looked at SQL Server 2008’s File Stream capabilities, and was enticed, but not enough to make that jump. When word of what SQL Server 2012 was bringing to the table came out, I was glad I didn’t go with the earlier version. SQL Server 2012’s File Table felt like a great fit. I could include the bytes of the file in an INSERT transaction, get rollback functionality without writing a compensating routine, and the file would end up on a filesystem. I would naturally use the company’s SAN for the filesystem. It has “snapshot” functionality which it integrates with SQL Server’s backup, block level de-duplication which saves space, and block level mirroring to the DR site, which saves bacon. To achieve logging of file reads, I would introduce a web service that would, upon receiving an id, translate the id to a path, perform access control checks, log the read, and return a stream of bytes to the caller. I could scale out this web service if the load on a single machine was too much to handle.

The “Gotcha”

On my development box, I begin to configure the database to handle a FileTable, which was pretty straightforward. Next, I learn about the hierarchyid datatype, and how to manipulate it, because that’s what I need to do to modify the [path_locator] column of the table. I’d read about hierarchyid before, but this was my first exposure to them. They’re not particularly difficult to understand or use, and I was soon modifying them with the desired result that the file whose record I was modifying was moving from directory to directory on the filesystem. Next step is to move the code I’d been working on into an AFTER INSERT trigger on the FileTable. If you know more about FileTables than I did, you’ll already know that you cannot modify a FileTable record in a trigger fired on a FileTable. Gaah! I don’t want to push responsibility for path management towards the client, I want it to stay right were I’d designed it! What I need is a reliable mechanism that ends up with a stored procedure executing, which will move new FileTable entries to the correct path.

The Resolution

I had played with SQL Server’s Service Broker (SSB) in the past, as a learning exercise, and thought that now I had a good use for it. I decide to put an AFTER INSERT trigger on the FileTable, and have that trigger queue an SSB message. The convenient thing about placing a message in a queue in the trigger is that if, for any reason, the original transaction rolls back, the queued message will also be rolled back. The contents of the message will be the [stream_id] of the files inserted. If multiple files are inserted as part of a single transaction, I’ll try to put all their ids in a single message, up to what I arbitrarily decide is a reasonable limit for message size. I decide 28 ids will result in a not-too-large message, just under 4KB, and we’ve never inserted 28 files as part of a single production transaction. The limit on ids presented another opportunity to use a new feature, the OFFSET/FETCH clauses in a SELECT statement. In the end, this is what my trigger looks like:

CREATE TRIGGER [trgInsMyFileTable] 
ON    [myFileTable] 
AFTER    INSERT
AS 
BEGIN
    SET NOCOUNT ON;
    DECLARE    @dialog    uniqueidentifier,
        @msg xml,
        @insertCount int,
        @counter int = 0,
        @pageSize int = 28;
    SELECT    @insertCount = COUNT(*)
    FROM    [inserted];
    WHILE    (@counter < @insertCount)
    BEGIN
        SET    @msg =    (SELECT    stream_id
            FROM    [inserted]
            ORDER    BY stream_id ASC
                OFFSET @counter ROWS
                FETCH NEXT @pageSize ROWS ONLY
            FOR        XML RAW, ROOT(N'rows'), ELEMENTS);
            
        SET    @counter = @counter + @pageSize;
        BEGIN    DIALOG CONVERSATION @dialog
        FROM SERVICE [NewStoredFileSendService]
        TO SERVICE N'NewStoredFileReceiveService'
        ON CONTRACT [NewStoredFileContract]
        WITH ENCRYPTION = OFF;
        SEND ON CONVERSATION @dialog MESSAGE TYPE [NewStoredFileRequest](@msg);
    END
END

If you’re unfamiliar with SSB, the BEGIN DIALOG and SEND ON statements will result in the message being reliably queued for processing. Processing, in this case, means that a stored procedure will be run. That procedure is supposed to dequeue the message and process it. It’s got a bunch of error handling code in it, but the meat of it is pretty simple. Here’s the full procedure:

CREATE PROCEDURE [dbo].[spProcessNewStoredFileRequest]
AS
BEGIN
    DECLARE    @messageBody    varbinary(MAX),
        @messageType    sysname,
        @dialog        uniqueidentifier;
    WHILE    (1 = 1)
    BEGIN
        BEGIN TRAN;
        BEGIN TRY;
            WAITFOR
            (
                RECEIVE TOP (1)
                    @messageType=message_type_name,
                    @messageBody=message_body,
                    @dialog=conversation_handle
                FROM    [dbo].[NewStoredFileReceiveQueue]
                ),    TIMEOUT 1000;
                IF    (@@ROWCOUNT=0)
                BEGIN
                    IF    (@@TRANCOUNT>0)
                    BEGIN
                        COMMIT TRAN;
                    END
                    BREAK;
                END
                IF    (@messageType=N'NewStoredFileRequest')
                BEGIN
/*
<rows>
  <row>
    <stream_id>AF520FD5-E9AD-E211-BF90-18037345B7C3</stream_id>
  </row>
  <row>
    <stream_id>B4C457A4-622B-482D-AA24-B6C8A4C712F3</stream_id>
  </row>
</rows>
*/
                    DECLARE    @messageXml xml = CONVERT(xml, @messageBody),
                        @streamIds    [dbo].[ListOfStreamIds];
                    INSERT    @streamIds (stream_id)
                    SELECT    T.c.value(N'.[1]', N'uniqueidentifier')
                    FROM    @messageXml.nodes(N'/rows/row/stream_id') T(c);
                    EXEC    dbo.spStoredFileTableDataManageDirectories @streamIds=@streamIds;
                    SEND ON CONVERSATION @dialog MESSAGE TYPE [NewStoredFileResponse];
                    END CONVERSATION @dialog;
                END
                ELSE IF (@messageType=N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog')
                BEGIN
                    END CONVERSATION @dialog;
                END
                ELSE IF (@messageType=N'http://schemas.microsoft.com/SQL/ServiceBroker/Error')
                BEGIN
                    -- Log the received error into ERRORLOG and system Event Log (eventvwr.exe)
                    DECLARE    @dialog_string nvarchar(100) = CONVERT(nvarchar(64), @dialog),
                            @error_message nvarchar(4000)= CONVERT(nvarchar(4000), @messageBody);
                
                    RAISERROR (N'Conversation %s was ended with error %s', 
                                    10, 1, @dialog_string, @error_message) WITH LOG;
                    END CONVERSATION @dialog;
                END
                COMMIT    TRAN;
        END TRY
        BEGIN CATCH
            SELECT    ERROR_NUMBER()        [ErrorNumber],
                ERROR_SEVERITY()    [ErrorSeverity],
                ERROR_STATE()        [ErrorState],
                ERROR_PROCEDURE()    [ErrorProcedure],
                ERROR_LINE()        [ErrorLine],
                ERROR_MESSAGE()        [ErrorMessage];
            DECLARE    @ErrNum int = ERROR_NUMBER(),
                    @ErrMsg nvarchar(4000) = ERROR_MESSAGE();
            IF    (@dialog IS NOT NULL)
            BEGIN
                END CONVERSATION @dialog WITH ERROR = @ErrNum DESCRIPTION = @ErrMsg;
            END
            IF    (@@TRANCOUNT>0)
            BEGIN
                ROLLBACK TRAN;
            END
        END CATCH;
    END
END

The part that begins “IF (@messageType=N'//companyname.com/projectname/NewStoredFileRequest')” is where the real work is done. It takes the incoming message, an example of which is shown, and fills a table-valued parameter with the IDs, which is passed to the stored procedure. Table-valued parameters, there’s another new feature used! When the stored procedure returns, a response is sent that acknowledges the incoming, and this end of the conversation is torn down. The last stored procedure to document is the one that directly manages the directories. Here’s the code:

CREATE PROCEDURE [dbo].[spStoredFileTableDataManageDirectories] 
    @streamIds dbo.ListOfStreamIds READONLY
AS
BEGIN
    SET    NOCOUNT ON;
    DECLARE    @destsByYearMonth    TABLE
        ([Year]            int
        ,[Month]        int
        ,YearName        AS (Format([Year], N'0000'))
        ,[MonthName]        AS (Format([Month], N'00'))
        ,YearFullName        nvarchar(4000)--AS 
        ,MonthFullName        nvarchar(4000)--AS 
        ,AddYear        bit DEFAULT(0)
        ,AddMonth        bit DEFAULT(0)
        ,YearPath        hierarchyid
        ,MonthPath        hierarchyid
        ,PRIMARY KEY([Year], [Month]));
    -- Determine the Years and Months from the creation date of the specified records
    INSERT    @destsByYearMonth ([Year], [Month])
    SELECT    DISTINCT Year(creation_time), Month(creation_time)
    FROM    @streamIds tvp
    JOIN    [myFileTable] ft ON tvp.stream_id=ft.stream_id;
    UPDATE    @destsByYearMonth
    SET    YearFullName = FileTableRootPath(N'myFileTable') + N'\' + YearName;
    UPDATE    @destsByYearMonth
    SET    MonthFullName = YearFullName + N'\' + MonthName;
    -- Retrieving the path_locator for the "Year" directories
    UPDATE    @destsByYearMonth
    SET    YearPath = GetPathLocator(YearFullName)
    FROM    [myFileTable];
    IF    (EXISTS(SELECT TOP (1) 1 FROM @destsByYearMonth WHERE YearPath IS NULL))
    BEGIN
        -- Generating new path_locators for new "Year" directories
        UPDATE    @destsByYearMonth
        SET    AddYear = 1,
            YearPath = dbo.fnGetNewPathLocator(newid(), hierarchyid::GetRoot())
        WHERE    YearPath IS NULL;
        -- Inserting new "Year" directories
        INSERT    [myFileTable](name, path_locator, is_directory)
        SELECT    YearName, YearPath, 1
        FROM    @destsByYearMonth
        WHERE    AddYear = 1;
    END
    -- Retrieving the path_locator for the "Month" directories
    UPDATE    @destsByYearMonth
    SET    MonthPath = GetPathLocator(MonthFullName)
    FROM    [myFileTable];
    IF (EXISTS(SELECT TOP (1) 1 FROM @destsByYearMonth WHERE MonthPath IS NULL))
    BEGIN
        -- Generating new path_locators for new "Month" directories
        UPDATE    @destsByYearMonth
        SET    AddMonth = 1,
            MonthPath = dbo.fnGetNewPathLocator(newid(), YearPath)
        WHERE    MonthPath IS NULL;
        -- Inserting new "Month" directories
        INSERT    [myFileTable](name, path_locator, is_directory)
        SELECT    [MonthName], MonthPath, 1
        FROM    @destsByYearMonth
        WHERE    AddMonth = 1;
    END
    UPDATE    d
    SET    path_locator=d.path_locator.GetReparentedValue(d.parent_path_locator, i.MonthPath)
    FROM    @streamIds s
    JOIN    [myFileTable] d ON s.stream_id=d.stream_id
    JOIN    @destsByYearMonth i ON Year(creation_time)=i.[Year] AND Month(creation_time)=i.[Month];
END

This procedure works its way from the Year and Month of the [creation_time] column of specified records, creating records that represent directories as necessary, to finally modifying the added files. It uses a function, dbo.fnGetNewPathLocator(), that I found here. I’ve modified to be more what I need, here’s the code:

CREATE FUNCTION [dbo].[fnGetNewPathLocator] 
    (@child uniqueidentifier
    ,@parent hierarchyid = NULL)
RETURNS    hierarchyid
AS
BEGIN
    DECLARE    @result hierarchyid,
        @binId binary(16) = CONVERT(binary(16), @child);
    SELECT @result = hierarchyid::Parse
        (
            COALESCE(@parent.ToString(), N'/') +
            CONVERT(nvarchar, CONVERT(bigint, SUBSTRING(@binId, 1, 6))) + N'.' +
            CONVERT(nvarchar, CONVERT(bigint, SUBSTRING(@binId, 7, 6))) + N'.' +
            CONVERT(nvarchar, CONVERT(bigint, SUBSTRING(@binId, 13, 4))) + N'/'
        );
    RETURN @result;
END

The changes that I made are that, instead of using some trick to call NEWID() in a function, I have the caller pass it in, and the datatype I return is a hierarchyid. I like this method of generating hierarchyids because it uses unique identifiers, similar to those the internal method FileTables use.

Conclusion

This exercise resulted in the study and use of several features of SQL Server 2012, some new, and some that had been around but I’d never had a reason to use. I haven’t tested this for performance, but I have no doubt it will behave well. I intend to measure the performance as I move the existing files out of their current storage and into this FileTable.