Click here to Skip to main content
6,822,123 members and growing! (16,531 online)
Email Password   helpLost your password?
Database » Database » SQL Server     Intermediate License: The Code Project Open License (CPOL)

Top 10 steps to optimize data access in SQL Server. Part V (Optimize database files and apply partitioning)

By M.M.Al-Farooque Shubho

As part of a series of articles on several data access optimization steps, this article focuses on organizing files/filegroups and applying partitioning in SQL Server database.
SQL, Windows, .NET, SQL-Server, Architect, DBA, Dev
Revision:10 (See All)
Posted:7 Nov 2009
Updated:8 Nov 2009
Views:5,502
Bookmarked:35 times
Unedited contribution
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
10 votes for this article.
Popularity: 4.36 Rating: 4.36 out of 5

1

2
1 vote, 10.0%
3
2 votes, 20.0%
4
7 votes, 70.0%
5

Downloads

Download Template_StoredProcedure.txt - 2.83 KB  

Download Template_Trigger.txt - 2.62 KB

Download Template_ScalarFunction.txt - 2.61 KB

Download Template_TableValuedFunction.txt - 2.6 KB

Download Template_View.txt - 2.54 KB

Introduction

Sorry for being so late in writing the last part of the article. I do have a full-time working job to earn my livings and for the last couple of months, I had been extremely busy with some of my client projects. (Sounds like an excuse?)

Any way, better late than never. It’s good to be back again!

Background

So, where were we?

We were actually executing a step-by-step optimization process in an SQL Server database, and so far, we’ve done quite a lot of things. Take a look at the following articles to get to know the steps that we’ve carried out so far. 

Top 10 steps to optimize data access in SQL Server. Part I (Use Indexing)  

Top 10 steps to optimize data access in SQL Server. Part II (Re-factor TSQLs and apply best practices)  

Top 10 steps to optimize data access in SQL Server. Part III (Apply advanced indexing and de-normalization) 

Top 10 steps to optimize data access in SQL Server. Part IV (Diagnose database performance problems)

This is the last part in this series of articles and here we are going to discuss some more topics to optimize a database performance. Please note that, the topics we are going to cover in this part of the articles are topics where the DBA’s mainly have expertise on. As a developer, I personally believe, we should at least have some working knowledge on this area so that, when we have no DBA around, we can at least try to do everything whatever a developer can do to optimize the database performance.

So, here we go. 

Step9 : Organize the file groups and files in the database

When an SQL Server database is created, the database server internally creates a number of files in the file system. Every database related object that gets created later in the database are actually being stored inside these files.

An SQL Server database has following three kinds of files

  • .mdf file : This is the primary data file. There could be only one primary data file for each database. All system objects resides in the primary data file and if a secondary data file is not created, all user objects (User created database objects) also takes place in the primary data file.
  • .ndf file : These are the secondary data files, which are optional. These files also contain user created objects.
  • .ldf file : These are the Transaction log files. These files could be one or many in number. Contains transaction logs.

When an SQL Server database is created, by default, the primary data file and the transaction log file is created. You can of course modify the default properties of these two files.

File group

Database files are logically grouped for better performance and improvement of administration on large databases. When a new SQL Server database is created, the primary file group is created and the primary data file is included in the primary file group. Also, the primary group is marked as the default group. As a result, every newly created user objects are automatically placed inside the primary file group (More specifically, inside the files in the primary file group).

If you want your user objects (Tables/Views/Stored Procedures/Functions and others) to be created in the secondary data file, then,

  • Create a new file group and mark that file group as Default.
  • Create a new data file (.ndf file) and set the file group of this data file to the new file group that you just created. 

After doing this, all subsequent objects you create in the database are going to be created inside the file(s) in the secondary file group.

Please note that, Transaction log files are not included in any file group.

FileGroup.JPG

File/ File group organization best practices

When you have a small or moderate sized database, then the default file/ file group organization that gets created while creating the database may be enough for you. But, when your database has a tendency to grow larger (Say, over 1000 MB) in size, you can (And should) do a little tweaking in the file/file group organizations in the database to enhance the database performance. Here are some of the best practices you can follow:

  • The primary file group must be totally separate and should be left to have only system objects and no user defined object should be created on this primary file group. Also, the primary file group should not be set as the default file group. Separating the system objects from other user objects will increase performance and enhance ability to access tables in the case of serious data failures.
  • If there are N physical disk drives available in the system, then try to create N files per file group and put each one in a separate disk. This will allow Distributing disk I/O loads over multiple disks and will increase performance.
  • For frequently accessed tables containing indexes put the tables and the indexes in separate file groups. This would enable to read the index and table data faster.
  • For frequently accessed table containing Text or Image columns, create a separate file group and put the text, next and image columns in that file group on different physical disks, and, put the tables in a different file group. This would enable faster data retrieval from the table with queries that doesn’t contain the text or image columns.
  • Put the transaction log file on a different physical disk that is not used by the data files. The logging operation (Transaction log writing operation) is more write-intensive, and hence, it is important to have the log on the disk that has good I/O performance.
  • Consider assigning the "Read only" tables into a file group that is marked as "Read only". This would enable faster data retrieval from these read only tables. Similarly, assign “Write only” tables in a different file group to allow for faster update.
  • Do not let the SQL Server to fire the “Auto grow” feature too often because, it is a costly operation. Set an "Auto grow" increment value so that, the database size is increased in less frequency (Say, once per week). Similarly, do not use the “Auto shrink” feature for the same reason. Disable it and either shrink the database size manually or use a scheduled operation that runs in a time interval (Say, once in a month).

Step10 : Apply partitioning in the big fat tables

What is table partitioning? 

Table partitioning means nothing but splitting a large table into multiple smaller tables so that, queries has to scan less amount data while retrieving. That is “Divide and conquer”.

When you have a large (In fact, very large, possibly having more than millions of rows) table in your database and when you see that, querying on this table is executing slowly, you should consider portioning this table (Of course, after making sure that all other optimization steps are done) to improve performance.

Following two following options are available to partition a table.

Horizontal partitioning

Suppose, we have a table containing 10 millions of rows. For easy understandability, let’s assume that, the table has an auto-increment primary key field (Say, ID). So, we can divide the table’s data into 10 separate portioning tables where each partition will contain 1 million rows and the partition will be based upon the value of the ID field. That is, First partition will contain those rows which have a primary key value in the range 1-1000000, and, Second partition will contain those rows which have a primary key value in the range 1000001-2000000 and so on.

As you can see, we are partitioning the table by grouping the rows based upon a criteria (ID range), which seems like we have a stack of books in a box from where we are horizontally splitting the stack by taking a group of books from the top and putting in smaller boxes. Hence, this is called horizontal partitioning.

Vertical partitioning

Suppose, we have a table having many columns and also millions of rows. Some of the columns in the table are very frequently accessed in some queries and most of the columns in the table are less frequently accessed in some other queries.

As the table size is huge (In terms of number of columns and rows), any data retrieval query from the table performs slowly. So, this table could be portioned based upon the frequency of access of the columns. That is, we can split the table into two or more tables (Partitions) where each table would contain a few columns from the original tables. In our case, a partition of this table should contain the columns that are frequently accessed by queries and another partition of this table should contain the columns that are less frequently accessed by other queries. Splitting the columns vertically and putting in different thinner partitions is called vertical partition

Another good criteria for applying vertical partitioning could be to partition the Indexed columns and non-indexed columns into separate tables. Also, vertical partitioning could be done by splitting the LOB or VARCHARMAX columns into separate tables.

Like the horizontal partitioning, vertical partitioning also allows to improve query performance (Because, queries now have to scan less data pages internally, as the other column values from the rows has been moved to another table), but, this type of partitioning is to be done carefully, because, if there is any query that involves columns from both partitions, then, the query processing engine would require joining two partitions of the tables to retrieve data, which in turn would degrade performance.

In this article, we would focus on horizontal partitioning only.

Partitioning best practice

  • Consider partitioning big fat tables into different file groups where each file inside the file group is spread into separate physical disks (So that the table spans across different files in different physical disk). This would enable database engine to read/write data operations faster.
  • For history data, consider partitioning based on “Age”. For example, suppose a table has order data. To partition this table, use the Order date column to split the table so that, a partition is created to contain each year’s sales data.

How to partition?

Suppose, we have an Order table in our database that contains Order data for 4 years (1999, 2000, 2001 and 2002) and this table contains millions of rows. We would like to apply partitioning on this table. To do that, following tasks are to be performed:

  • Add user defined file groups to the database
Use the following SQL command to create a file group 

ALTER DATABASE OrderDB ADD FILEGROUP [1999] 
ALTER DATABASE OrderDB ADD FILE (NAME = N'1999', FILENAME
= N'C:\OrderDB\1999.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 5MB) TO
FILEGROUP [1999]   

Here, we are adding a file group ‘1999’ and adding a secondary data file ‘C:\OrderDB\1999.ndf' to this file group. We did this, because, we would like to put our table partitions into separate files in separate file groups.

Using the SQL command above, create another 3 file groups ‘2000’,’2001’ and ‘2002’. As you perhaps could imagine already, each of these file group would store a year’s order data inside their corresponding data files.

  • Create a partition function  
A partition function is an object that defines the boundary points for partitioning data. Following command creates a partition function
CREATE PARTITION FUNCTION FNOrderDateRange (DateTime) AS
RANGE LEFT FOR VALUES ('19991231', '20001231', '20011231')
The above partition function specifies that, the Order date column having value between
DateTime <= 1999/12/31 would fall into 1st partition.
DateTime > 1999/12/31 and <= 2000/12/31 would fall info 2nd partition.
DateTime > 2000/12/31 and <= 2001/12/31 would fall info 3rd partition.
DateTime > 2001/12/31 would fall info 4th partition.
The RANGE LEFT is used to specify that, the boundary value should fall into left partition. For example, here the boundary value 1999/12/31 is falling into the 1st partition (With all other dates less than this value) and the next value is falling into the next partition. If we specify RANGE RIGHT, then, the boundary value would fall into the right partition. So, in this example, the boundary value 2000/12/31 would fall into the 2nd partition and any date less than this value would fall into the 1st
partition.
  • Create a partition schema 
The partition scheme maps the partitions of a partitioned table/index to the file groups that will be used to store the partitions.
Following command creates a partition schema  
CREATE PARTITION SCHEME OrderDatePScheme AS PARTITION FNOrderDateRange
TO ([1999], [2000], [2001], [2002])
Here, we are specifying that,
The 1st partition should go into the ‘1999’ file group
The 2nd partition should go into the ‘2000’ file group
The 3rd partition should go into the ‘2001’ file group
The 4th partition should go into the ‘2002’ file group
  • Apply partitioning on the table  
At this point, we have defined the necessary partitioning criteria. So all we need to do now is to partition the table.
In order to do this, follow these steps:
Drop the existing clustered index from the table, that is most likely created due to the primary key creation on the table. The clustered index can be dropped by using the DROP INDEX statement. The statement. Assuming that, the PK_Orders is the primary key of the table, use the following command to drop the primary key which will eventually drop the clustered index from the table.
ALTER TABLE Orders DROP CONSTRAINT
PK_Orders 
Recreate the clustered index on the partition scheme: The index can be created on a partitioned scheme as follows:
CREATE UNIQUE CLUSTERED INDEX PK_Orders ON Orders(OrderDate) ON
OrderDatePScheme (OrderDate) 
Assuming that, the OrderDate column values are unique in the table, the table will be partitioned based on the partition scheme specified (OrderDatePScheme) which internally uses the partition function to partition the table into 4 smaller parts in 4 different file groups.

There are quite a few very well-written articles on the web on Table partitioning. I can mention a few here:

Partitioned Tables and Indexes in SQL Server 2005 (A very detail level explanation on partitioning).

SQL SERVER – 2005 – Database Table Partitioning Tutorial – How to Horizontal Partition Database Table (A very simple and easily understandable tutorial on partitioning).

Step 11 (The bonus step): Better-manage the DBMS objects, Use TSQL Templates

We all know that, In order to better manage the DBMS objects (Stored procedures, Views, Triggers, Functions etc), it’s important to follow a consistent structure while creating these. But, for many reasons (Due to time constraints mainly), most of the times we fail to maintain a consistent structure while developing these DBMS objects. So, when the codes are debugged later for any performance related issue or for any reported bug, it becomes a nightmare for any person to understand the code and find the possible causes.

To help you In this regard, I have developed some TSQL templates that you can use to develop the DBMS objects using a consistent structure within a short amount of time.

I’ve also imagined that, there will be a person reviewing the DBMS objects and routines created by the team. The review process helps identifying the issues (Say, best practices) that generally are missed by the developers due to work pressure or other issues, and, the templates have a “REVIEW” section where the reviewer can put review information along with comments. 

I’ve attached some sample templates of different common DBMS objects in SQL Server. Here these are:  

  • Template_StoredProcedure.txt : Template for developing stored procedures
  • Template_View.txt : Template for developing Views
  • Template_Trigger.txt : Template for developing Triggers
  • Template_ScalarFunction.txt : Template for developing Scalar functions
  • Template_TableValuedFunction.txt : Template for developing Table valued functions  

How to create templates?

At first, you need to create the templates in your SQL Server Management Studio. To do this, you need to download the attached templates and follow the steps given below. I’ve used the Template_StoredProcedure.txt for creating the Stored Procedure template. You can follow the same procedure to create other templates.

  • Open SQL Server Management Studio and go to View->Template explorer

  • Go to the Template explorer, and the “Stored Procedure” node and expand it

  • Rename the newly created blank template as the following

  • Right click on the newly created template and open it in the edit mode as follows:

  • The SQL Server management studio will ask for credentials. Provide valid credential and press “Connect” to access the database instance and to edit the template.

  • Open the attached Template_StoredProcedure.txt in an editor, copy all contents and paste onto the template that is being edited in the Management Studio.

  • Save the template by pressing the Save button in the Management Studio

How to use the templates?

Well, after creating all the templates in the SQL Server Management Studio, it’s time to use them. I am showing how to use the Stored Procedure template here, but, the procedure is the same for all other templates

  • In the Template Explorer, double click on the newly created Stored Procedure template

  • The SQL Server Management Studio will ask for valid credentials to use the template. Provide valid credentials and press “Connect”.

  • After connecting successfully, the template will be opened in the editor for filling up the variables with appropriate values.

  • Specify values for the template by clicking the following icon. Alternatively, you can press Ctrl+Shift+M to do the same

  • Specify parameter values and press “OK”

  • The template will be filled up by the provided values.

  • Select the target database where you would like to execute the Stored Procedure creation script and press execute icon

  • If everything is OK, the Stored procedure should be created successfully

You can follow the above steps to create other DBMS objects (Functions, Views, Triggers etc).

I can promise, you can create the DBMS objects using the templates now in an easy manner, within a quick amount of time.

Summary

Optimization is a “Mindset”, rather than an automatic occurrence. In order to optimize your database performance, first you have to believe that, optimization is possible. Then you need to give your best effort and apply knowledge and best practices to optimize. The most important part is, you have to try to “Prevent” any possible performance issue that may take place later, by applying your knowledge before or along with your development activity, rather than trying to recover “After” the problem occurs. One little extra hour spent in the development time to ensure good practices may save your hundred long hours for you in the long run!

“Work smarter, not harder” :)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

M.M.Al-Farooque Shubho


Member
A passionate software developer who loves to think, learn and observe the world around. Working in the .NET based software application development for quite a few years.

My LinkedIn profile will tell you more about me.

Have a look at My other CodeProject articles.

Learn about my visions, thoughts and findings at My Blog.

Awards:

CodeProject MVP 2010

Prize winner in Competition "Best Asp.net article of May 2009"

Prize winner in Competition "Best overall article of May 2009"

Prize winner in Competition "Best overall article of April 2009"

Occupation: Software Developer
Location: Bangladesh Bangladesh

Other popular Database articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
  (Refresh) 
-- There are no messages in this forum --

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.

PermaLink | Privacy | Terms of Use
Last Updated: 8 Nov 2009
Editor:
Copyright 2009 by M.M.Al-Farooque Shubho
Everything else Copyright © CodeProject, 1999-2010
Web22 | Advertise on the Code Project