This article describes ways to improve SQL query performance for very large data sets having million and billions of records in many tables.
I have been working on one of the projects wherein we work on devices which keep on sending huge amount of data. These devices keep on sending different kinds of data to backend server which we need to analyze, process and load to UI for their meaningful information.
So as the project evolved, the requirement to support number of devices (hence data) increased tremendously. And we as a team had to analyze our slow SPs to perform well as per growing demand for performance. AS we sailed through the performance challenges; the results were encouraging and motivated us to do even more.
Just to give an example of how much improvement was achieved due to hard work that went into it, earlier it used to take around 26 minutes to return 3 months of data by UI SP but after employing various techniques, it came down to less than a minute. Wow! Isn't it?
Similarly, for ETL processing, earlier for 5K records it used to take around 27 min. Now for 5K records, it takes around 6 minutes. Impressive! Isn't it?
So, let us begin the journey of identifying slow running SPs and finding issues inside them.
There are various tools available in SQL server for identifying performance issues. They are listed below.
1. Database Engine Tuning Advisor
This is an inbuilt tool in Microsoft SQL Server Management Studio which helps in analyzing required indexes, statistics, partitioning, strategy and physical design structure for performance improvement.
2. Query Plan
With thorough analysis of query plan, we can find out bottlenecks and efficiency of plans and change our queries accordingly.
3. SQL Profiler
It's a very good tool to identify slow running queries.
4. Query Store
Available from SQL server 2016 onwards, it stores a history of queries, plans, and most resource consuming SPs so that later, we can analyze them.
5. Activity Monitor
This tool displays information about the SQL Server processes and their effect on SQL Server performance.
So, after analysis of our SPs using above tools, we decided our target approach in two parts.
A. Improvement Specific for UI Facing SPs
1. Server-side Paging
Paging moved to server side, that improved performance a lot.
2. Moving Processor Intensive Calculated Columns to Separate Service
Some of the calculated columns were having more joins and complex logic, so, instead of returning them for all visible rows on UI, it was decided to calculate them only for selected row by separating them out in different service.
B. Improvements in General for All Types
ETL processing SPs and UI facing SPs.
1. Creating More Index After Analyzing Their Effect
By analyzing query plans, we came to know that where some more indexes are needed, however you have to verify that new index is really helping or not.
2. Correcting the Join Order
You should always be careful that tables with least number of records should come first in join order.
3. Avoiding Function Call
Try to do in batch to avoid function call for individual record, in our case, there were lots of UTC time related conversion functions slowing down SPs considerably. I avoided that by adding new columns to store UTC time in table itself.
4. Reducing the No of Joins by Incorporating Required Columns Into Temp Table
If we see the same repetitive joins at multiple places for different columns, strive to store all those columns into a single temp table to avoid repetitive joins.
5. Use of Temp Tables Instead of Table Variable
Table variables are best suited for smaller set, but if number of records are much more, go for temp tables as you can also create index on them.
6. Creating Index on Temp Tables Only After Inserting Data
Indexes are heavier operations for
insert, so it is best to create index on temp tables only after inserting data.
7. Use of Force Order Query Hint
Where we definitely know that order of tables are ascending order as per their no records.
8. Query Made to Run on Batch
At many places, there were loops which we converted to batch processing. Remember SQL is best suited for batch processing.
9. Table Partition Created
Since our tables hold millions of records, joining them with another equally loaded tables was a big performance hit. So, we went ahead with table partition to partition our tables based on range of primary keys (for master tables) and time based (for historical tables). This greatly improved the query performance.
10. Column-Store Index Created for Warehouse Tables
This we did on an experimental basis. Since we are using SQL server 2014 which has a limitation that after using column store index, we lose unique constraint on table. But it is a very good feature for warehouse tables for faster data retrieval).
11. Archival of Older Data
This is the activity wherein we want to archive data more than a year old to improve performance.
12. Removal of Redundant Logging
Sometimes, we find logging scattered around in our SPs, which can also be a performance hit. Try to reduce it or make them conditional.
13. Parallel Execution of Non-Depending SPs
In our case, there were multiple SPs running one after another in parent SP. We separated them out which were not dependent on each other and made them run in parallel.
14. Regular Index Defragmentation
As we know, indexes do fragment as new data keeps on coming. We should plan for regular index defragmentation job and updating tables statistics also.
15. Code Refactoring
A usual activity to improve code quality.
So the journey of performance tuning has been very exciting, and the results are very encouraging.
So that's all for now, guys.
Keep learning, keep exploring.
- 4th July, 2020: Initial version