<g:plusone size="tall" href="http://blog.andrei.rinea.ro/2012/07/13/new-features-for-database-developers-in-sql-server-2012-simpler-paging-sequences-and-filetables/" count="true" />
- Paging got simpler and more efficient
- Sequences have been introduced; better performance for auto-generated IDs and easier to have IDs unique across tables
- FileTables have been introduced : building upon the FileStream feature now we can have non-transactional access to files stored in the DB as a windows share along with transactional access via T-SQL
Lengthier version :
SQL Server 2012, in my opinion does not come with earth-shaking changes but comes with performance improvements, feature improvements and a some new features.
First of all, Management Studio has the same engine as Visual Studio which means you get a nice WPF experience, better font rendering and CTRL-scroll quick zoom-in/zoom-out.
Let’s say you want to retrieve data from the database in a paged way (that means chunks of data of a requested size or smaller). Typically you would write this in SQL Server 2008 R2 or older :
DECLARE @Offset AS INT = 6
DECLARE @PageSize AS INT = 5
ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber
WHERE UsersSelection.RowNumber > @Offset
AND UsersSelection.RowNumber <= @Offset + @PageSize
In SQL Server 2012 the T-SQL syntax has been updated introducing keywords that facilitate a simpler and more efficient paging, keywords such as OFFSET, FETCH, NEXT ROWS and ONLY. A script that would retrieve the same data would be :
DECLARE @Offset AS INT = 6
DECLARE @PageSize AS INT = 5
ORDER BY Id
OFFSET @Offset ROWS
FETCH NEXT @PageSize ROWS ONLY
Observe the simpler, clearer syntax. Also, considering that the subselect has been eliminated (the subselect was required because the ROW_NUMBER column could not be addressed in the same select – for the WHERE clause), also the query cost was improved :
More details about the paging (some call it pagination) semantics can be found on MSDN.
Next, let’s introduce sequences. MSDN states that :
A sequence is a user-defined schema bound object that generates a sequence of numeric values according to the specification with which the sequence was created. The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted. Sequences, unlike identity columns, are not associated with specific tables. Applications refer to a sequence object to retrieve its next value. The relationship between sequences and tables is controlled by the application. User applications can reference a sequence object and coordinate the values across multiple rows and tables.
Use sequences instead of identity columns in the following scenarios:
- The application requires a number before the insert into the table is made.
- The application requires sharing a single series of numbers between multiple tables or multiple columns within a table.
- The application must restart the number series when a specified number is reached. For example, after assigning values 1 through 10, the application starts assigning values 1 through 10 again.
- The application requires sequence values to be sorted by another field. The NEXT VALUE FOR function can apply the OVER clause to the function call. The OVER clause guarantees that the values returned are generated in the order of the OVER clause’s ORDER BY clause.
- An application requires multiple numbers to be assigned at the same time. For example, an application needs to reserve five sequential numbers. Requesting identity values could result in gaps in the series if other processes were simultaneously issued numbers. Calling sp_sequence_get_range can retrieve several numbers in the sequence at once.
- You need to change the specification of the sequence, such as the increment value.
Now, let’s take a concrete example of a scenario. Let’s say there is some kind of inventory in your company that needs to track certain company assets. Each asset must have its identifier which should be unique across the entire inventory. Let’s suppose we need to track chairs, monitors and other office items and their main characteristics.
One way to do this would be to create a wide table such as :
CREATE TABLE Inventory (
Id INT IDENTITY(1,1) NOT NULL,
ItemTypeId TINYINT NOT NULL,
InchSize TINYINT NULL, -- applies only to monitors
HasArmRest BIT NULL, -- applies only to chairs
AdjustableHeight BIT NULL, --might apply to monitors and chairs
-- ... and so on
That would be a wrong way to do this because of several reasons :
- It would be hard to read, maintain, query
- It would violate several database design rules
- Most of the columns would be nullable increasing the storage space and query speed
Therefore we will design several tables, one for each of the inventory item, also getting rid of the ItemTypeId column.
However generating the ID would no longer be so simple. If we put an ID column as an IDENTITY on each table then we won’t have unique IDs across the inventory. So the 2008-R2-and-older approach would be to create another table, with a single IDENTITY column, table whose whole purpose would be to generate unique IDs. The script to insert an item in this scenario would be :
DECLARE @Id INT
INSERT INTO InventoryIds DEFAULT VALUES
SET @Id = SCOPE_IDENTITY()
INSERT INTO Monitors
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT @ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
Notice the use of SCOPE_IDENTITY() instead of @@IDENTITY (a short and nice comparison can be found in this StackOverflow response) and INSERT INTO [TableName] DEFAULT VALUES, which at least for me, was a new thing.
Seems overly complicated and to an extent it is. But at least it does a few things well :
- It is transactional so if for some reasons a part of fails, all fails and no orphan IDs will be left behind in the
- It employs a simpler error-handling through the use of TRY/CATCH. By the way 2012 introduces the THROW keyword which can be used as is (no parameters) to rethrow the error to the caller.
- On error it rolls back the transaction AND it reports the error to the caller as close as it was.
Let’s see how we can simplify this in SQL Server 2012 using sequences :
First, we don’t need the InventoryIds any more. Then we’ll create a sequence :
Notice the new “Sequences” node under “Programability”. Next we’ll set up the sequence attributes :
The cycle option is useless in this and the rest of the majority of scenarios. The cache option can be useful in high-loads scenarios since the database engine doesn’t need to read the disk for each ID generated, as it was the case with IDENTITY. The performance improvement is not huge but it’s welcome nevertheless.
Next we’ll write the SQL script to insert the same data :
INSERT INTO Monitors
NEXT VALUE FOR seqInventory,
Notice the simpler syntax, a lot shorter and more elegant. Also the monitor is larger
During the presentation at RONUA I was asked two questions that I didn’t have a (definitive) answer back then :
- Can’t we use UNIQUEIDENTIFIERs and be done with these sequences or sequence-simulation via that table?
Yes, we can but this has some disadvantages : longer key means more I/O and more storage which hampers JOIN performance and a GUID will be harder to manipulate for a human being – let’s say the poor guy that has to stamp all the inventory goods with the IDs
- What data types are supported by sequences?
TINYINT, INT, SMALLINT, BIGINT, DECIMAL and NUMERIC
Finally we approach filetables. This addresses storing (large) binary content in the database. Since this was possible the database people have been torn in two camps : the ones that see this as the best solution and the ones that hate this to death. Personally I’m in the first camp
Typically a CMS / or some other type of website will allow (some) users to upload pictures. This is a common scenario that is encountered but, of course, it’s not the only one.
The other solution would be to store the files on the disk (in the file system, that is) and have a table in the database which contains all the related data (filename, extension, length, date-modified, date-uploaded, actual-filename, actual-path and so on).
The advantages of storing the files in the database would be, in my opinion :
- One data store for all the application’s data; better organization
- A single backup not two or more
- Online backup; the app can add/delete/modify files while the backup is running
- Incremental backups; yes, rsync could do incremental backups for the filesystem but it’s harder
Now considering you would opt for this solution – storing the files in the DB – SQL Server 2012 offers you the filetable feature. This is a solution built upon the FILESTREAM feature which means storing the binary data outside of the MDF (Main Data File) of the database in order to avoid degrading the performance of the typical structured data.
The FileTable feature also allows you to access the files stored in the database via a virtual windows share so any app can access the files from there.
Let’s walk this through and we’ll comment on the feature as we go.
First let’s enable the FILESTREAM support at the server level. Open SQL Server Configuration Manager (requires Administrator privileges) select the SQL Server instance and right-click to properties :
We’ll check all the check-boxes. After this a server restart might be required. Next we’ll create a new database (an existing database can be enabled too, but for clarity we’ll start with a new database).
In the ‘New Database’ dialog we’ll go through the three tabs from bottom to top. Let’s start with Filegroups and create a new FILESTREAM group :
Next, at the options tab, choose a directory name and select the desired access level to the windows share (‘non-transacted access’). For this demo I chose ‘Full’ :
Finally we’ll add a new file in the General tab, then choose a name for the file, select FILESTREAM Data as its type, select the FILESTREAM group and choose a path. The path must exist and it will not be created by SQL Server. You need to create it before finalizing the database creation.
Now click ok and the database is finally created. It’s a bit of ‘pain in the side’ but it only hurts the first time (like other things in life..). Let’s create a filetable in this new database :
Notice the new ‘FileTable’ node under ‘Tables’. While doing this we’ll not be greeted with a dialog box (as I consider we should be) but with a new script like so :
-- Create FileTable template
USE <database, sysname, AdventureWorks>
IF OBJECT_ID('<schema_name, sysname, dbo>.<table_name, sysname, sample_filetable>', 'U') IS NOT NULL
DROP TABLE <schema_name, sysname, dbo>.<table_name, sysname, sample_filetable>
CREATE TABLE <schema_name, sysname, dbo>.<table_name, sysname, sample_filetable> AS FILETABLE
FILETABLE_DIRECTORY = '<file_table_directory_name, sysname, sample_filetable>',
FILETABLE_COLLATE_FILENAME = <file_table_filename_collation, sysname, database_default>
From all this mess we’ll strip out the unnecessary and we’ll be left with this :
CREATE TABLE MyImages AS FILETABLE
FILETABLE_DIRECTORY = 'MyImages'
This little statement will create a table like so :
Right click on the file table (in Object Explorer) and select the Explore FileTable Directory. This will bring up an empty shared folder. Create a new text file and open it, write some text, save it and close the Notepad.
CAST(file_stream as VARCHAR(MAX)) AS [TextContent]
You’ll notice how the files moved, renamed, deleted, content-changed and any kind of change done in the windows share is reflected back in the FileTable and vice-versa!
This is a very powerful feature, and in my opinion, quite welcome.