Click here to Skip to main content
14,355,397 members

Optimizing LINQ Queries in C#.NET for MS SQL Server

Rate this:
5.00 (5 votes)
Please Sign up or sign in to vote.
5.00 (5 votes)
13 Sep 2019CPOL
Some aspects of LINQ query optimization for MS SQL Server

Introduction

LINQ was added to .NET as a new powerful data manipulation language. LINQ to SQL allows you to talk with DBMS quite conveniently by using Entity Framework, for example. But often when using it, developers forget to consider what kind of SQL query will be generated by the queryable provider (Entity Framework in our example case). In this article, we’ll look at how exactly you can optimize the performance of LINQ queries.

Implementation

Let’s examine the two most important points by using an example.

First, we’ll need to create the Test database in SQL Server. In this database, we’ll create two tables by running the following query:

USE [TEST]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Ref](
[ID] [int] NOT NULL,
[ID2] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[InsertUTCDate] [datetime] NOT NULL,
 CONSTRAINT [PK_Ref] PRIMARY KEY CLUSTERED 
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, _
 IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Ref] ADD  CONSTRAINT [DF_Ref_InsertUTCDate]  _
                        DEFAULT (getutcdate()) FOR [InsertUTCDate]
GO

USE [TEST]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Customer](
[ID] [int] NOT NULL,
[Name] [nvarchar](255) NOT NULL,
[Ref_ID] [int] NOT NULL,
[InsertUTCDate] [datetime] NOT NULL,
[Ref_ID2] [int] NOT NULL,
 CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, _
 IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Customer] ADD  CONSTRAINT [DF_Customer_Ref_ID] DEFAULT ((0)) FOR [Ref_ID]
GO

ALTER TABLE [dbo].[Customer] ADD  CONSTRAINT [DF_Customer_InsertUTCDate]  _
                             DEFAULT (getutcdate()) FOR [InsertUTCDate]
GO

Now, let’s fill the Ref table with the help of the following script. Wait... I just ran a script, but didn’t save it. In such cases, SQL Complete by Devart can be very convenient. It integrates with SSMS and Visual Studio and has the Execution History feature:

Image 1

Figure 1 – Execution History feature

This functionality displays the history of queries that were executed in SSMS:

Image 2

Figure 2 – History of queries

Take note of how the window consists of the following elements:

  1. Search box for filtering the results
  2. Date range box for filtering the results
  3. Results presented in a table. You can sort data by this table’s columns (by using the SHIFT key, you can choose a set of columns for sorting)
  4. Code of the selected row

The result table contains a history of scripts executed in SSMS and comprises the following tables:

  1. Status – shows whether the script was executed successfully
  2. Query Text – code of the script
  3. Size (Bytes) – size of the script in bytes
  4. Executed On – date and time at which the script was executed
  5. Duration – how long it took for the script to be executed
  6. File – name of a file or a tab in SSMS followed by the name of the SQL Server instance on which the script was executed
  7. Server - name of the SQL Server instance on which the script was executed
  8. User – the login under which the script was executed
  9. Database – the database context in which the script was executed

We can find the required query in this history table:

USE [TEST]
GO

DECLARE @ind INT=1;

WHILE(@ind<1200000)
BEGIN
INSERT INTO [dbo].[Ref]
           ([ID]
           ,[ID2]
           ,[Name])
    SELECT
           @ind
           ,@ind
           ,CAST(@ind AS NVARCHAR(255));

SET @ind=@ind+1;
END 
GO

In a similar way, we can populate the Customer table by using the following script:

USE [TEST]
GO

DECLARE @ind INT=1;
DECLARE @ind_ref INT=1;

WHILE(@ind<=12000000)
BEGIN
IF(@ind%3=0) SET @ind_ref=1;
ELSE IF (@ind%5=0) SET @ind_ref=2;
ELSE IF (@ind%7=0) SET @ind_ref=3;
ELSE IF (@ind%11=0) SET @ind_ref=4;
ELSE IF (@ind%13=0) SET @ind_ref=5;
ELSE IF (@ind%17=0) SET @ind_ref=6;
ELSE IF (@ind%19=0) SET @ind_ref=7;
ELSE IF (@ind%23=0) SET @ind_ref=8;
ELSE IF (@ind%29=0) SET @ind_ref=9;
ELSE IF (@ind%31=0) SET @ind_ref=10;
ELSE IF (@ind%37=0) SET @ind_ref=11;
ELSE SET @ind_ref=@ind%1190000;
INSERT INTO [dbo].[Customer]
          ([ID]
          ,[Name]
          ,[Ref_ID]
          ,[Ref_ID2])
    SELECT
          @ind,
          CAST(@ind AS NVARCHAR(255)),
          @ind_ref,
          @ind_ref;

SET @ind=@ind+1;
END
GO

The SQL Complete tool can help with keeping the code of your scripts neatly formatted.

In this way, we created two tables – one of them has over 1 million rows, and the other has more than 10 million.

Now we need to create a test project in Visual Studio. This one will be a Visual C# Console App (.NET Framework).

Next, we’ll need to add a library for the Entity Framework so that we can interact with the database.

To add this library, right-click the project and choose ‘Manage NuGet Packages...’ in the context menu:

Image 3

Figure 3 - ‘Manage NuGet Packages...’ in the context menu

In the window that will be opened, let’s enter ‘Entity Framework’ in the search box, choose the Entity Framework package, and install it:

Image 4

Figure 4 - Entity Framework package installation

Next, in the App.config file, we’ll add the following block after the configSections element:

<connectionstrings>
<add connectionstring="data source=MSSQL_INSTANCE_NAME;
 Initial Catalog=TEST;Integrated Security=True;" name="DBConnection"
 providername="System.Data.SqlClient">
 </add>
 </connectionstrings>

Make sure the connection string is entered in connectionString.

Now, let’s create 3 interfaces in separate files:

  1. IBaseEntityID:
    namespace TestLINQ
    {
        public interface IBaseEntityID
        {
            int ID { get; set; }
        }
    }
  2. IBaseEntityName:
    namespace TestLINQ
    {
        public interface IBaseEntityName
        {
            string Name { get; set; }
        }
    }
  3. IBaseNameInsertUTCDate:
    namespace TestLINQ
    {
        public interface IBaseNameInsertUTCDate
        {
            DateTime InsertUTCDate { get; set; }
        }
    }    

In a separate file, create a basic class BaseEntity for our two entities, which will contain their common fields:

    namespace TestLINQ
{
    public class BaseEntity : IBaseEntityID, IBaseEntityName, IBaseNameInsertUTCDate
    {
        public int ID { get; set; }
        public string Name { get; set; }
        public DateTime InsertUTCDate { get; set; }
    }
}    

Next, we’ll create our two entities, each one in a separate file:

  1. Ref:
    using System.ComponentModel.DataAnnotations.Schema;
    
    namespace TestLINQ
    {
        [Table("Ref")]
        public class Ref : BaseEntity
        {
            public int ID2 { get; set; }
        }
    }
  2. Customer:
    using System.ComponentModel.DataAnnotations.Schema;
    
    namespace TestLINQ
    {
        [Table("Customer")]
        public class Customer: BaseEntity
        {
            public int Ref_ID { get; set; }
            public int Ref_ID2 { get; set; }
        }
    }     

Finally, let’s create a context UserContext in a separate file:

    using System.Data.Entity;

namespace TestLINQ
{
    public class UserContext : DbContext
    {
        public UserContext()
            : base("DbConnection")
        {
            Database.SetInitializer<usercontext>(null);
        }

        public DbSet<customer> Customer { get; set; }
        public DbSet<ref> Ref { get; set; }
    }
}

In this way, we get a solution for performing optimization tests with the help of LINQ to SQL through Entity Framework for MS SQL Server:

Image 5

Figure 5 - Solution for optimization tests

Now, let’s enter the following code in the Program.cs file:

using System;
using System.Collections.Generic;
using System.Linq;

namespace TestLINQ
{
    class Program
    {
        static void Main(string[] args)
        {
            using (UserContext db = new UserContext())
            {
                var dblog = new List<string>();
                db.Database.Log = dblog.Add;

                var query = from e1 in db.Customer
                            from e2 in db.Ref
                            where (e1.Ref_ID == e2.ID)
                                 && (e1.Ref_ID2 == e2.ID2)
                            select new { Data1 = e1.Name, Data2 = e2.Name };

                var result = query.Take(1000).ToList();

                Console.WriteLine(dblog[1]);

                Console.ReadKey();
            }
        }
    }
}

When we run the project, this is the output we’ll see in the console as a result:

SELECT TOP (1000)
[Extent1].[Ref_ID] AS [Ref_ID],
[Extent1].[Name] AS [Name],
[Extent2].[Name] AS [Name1]
FROM  [dbo].[Customer] AS [Extent1]
INNER JOIN [dbo].[Ref] AS [Extent2] ON ([Extent1].[Ref_ID] = [Extent2].[ID]) _
                                    AND ([Extent1].[Ref_ID2] = [Extent2].[ID2])

As you can see, a LINQ query has efficiently generated a SQL query to the MS SQL Server DBMS.

Now, let’s change the AND condition to OR in the LINQ query:

var query = from e1 in db.Customer
                        from e2 in db.Ref
                        where (e1.Ref_ID == e2.ID)
                             || (e1.Ref_ID2 == e2.ID2)
                        select new { Data1 = e1.Name, Data2 = e2.Name };

Execute the application again.

An exception will be thrown. From the description of the error, we’ll see that the operation has timed out after 30 seconds:

Image 6

Figure 6 – Error description

This is the query that LINQ has created:

Image 7

Figure 7 – LINQ query

We can see that the selection is performed through a cartesian product of two sets(tables):

SELECT TOP (1000)
  [Extent1].[Ref_ID] AS [Ref_ID],
  [Extent1].[Name] AS [Name],
  [Extent2].[Name] AS [Name1]
  FROM  [dbo].[Customer] AS [Extent1]
  CROSS JOIN [dbo].[Ref] AS [Extent2]
  WHERE [Extent1].[Ref_ID] = [Extent2].[ID] OR [Extent1].[Ref_ID2] = [Extent2].[ID2]

Let’s rewrite the LINQ query like this:

var query = (from e1 in db.Customer
                        join e2 in db.Ref
                        on e1.Ref_ID equals e2.ID
                        select new { Data1 = e1.Name, Data2 = e2.Name }).Union_
                                     (from e1 in db.Customer
                                        join e2 in db.Ref
                                        on e1.Ref_ID2 equals e2.ID2
                                        select new { Data1 = e1.Name, Data2 = e2.Name });

This is the SQL query we’ll get as a result:

SELECT 
    [Limit1].[C1] AS [C1], 
    [Limit1].[C2] AS [C2], 
    [Limit1].[C3] AS [C3]
    FROM ( SELECT DISTINCT TOP (1000) 
        [UnionAll1].[C1] AS [C1], 
        [UnionAll1].[Name] AS [C2], 
        [UnionAll1].[Name1] AS [C3]
        FROM  (SELECT 
            1 AS [C1], 
            [Extent1].[Name] AS [Name], 
            [Extent2].[Name] AS [Name1]
            FROM  [dbo].[Customer] AS [Extent1]
            INNER JOIN [dbo].[Ref] AS [Extent2] ON [Extent1].[Ref_ID] = [Extent2].[ID]
        UNION ALL
            SELECT 
            1 AS [C1], 
            [Extent3].[Name] AS [Name], 
            [Extent4].[Name] AS [Name1]
            FROM  [dbo].[Customer] AS [Extent3]
            INNER JOIN [dbo].[Ref] AS [Extent4] _
            ON [Extent3].[Ref_ID2] = [Extent4].[ID2]) AS [UnionAll1]
    )  AS [Limit1]    

Unfortunately, there can only be one connection condition in one LINQ query, so we can achieve the results we need by creating one query for each of the two conditions and then combining them by using Union to delete repeating lines.

Yes, queries will be non-equivalent in most cases, considering that full row duplicates can be returned. However, in real life, duplicate rows are not needed and usually they’re something you’ll want to get rid of.

Now let’s compare the execution plans of these two queries:

  1. Average execution time for CROSS JOIN is 195s:

    Image 8

    Figure 8 – Execution time for CROSS JOIN
  2. Average execution time for INNER JOIN-UNION is less than 24s:

    Image 9

    Figure 9 – Execution time for INNER JOIN-UNION

As we can see from the results, the optimized LINQ query works several times faster than the unoptimized one in these two tables with millions of records.

For the version with the AND condition, a LINQ query will look like this:

var query = from e1 in db.Customer
                        from e2 in db.Ref
                        where (e1.Ref_ID == e2.ID)
                             && (e1.Ref_ID2 == e2.ID2)
                        select new { Data1 = e1.Name, Data2 = e2.Name };

Almost always in this case, a correct SQL query will be generated, with the execution time of approximately 24 seconds:

Image 10

Figure 10 - Correct SQL query

Also, for LINQ to Objects operations, instead of a query that looks like this:

var query = from e1 in seq1
                            from e2 in seq2
                            where (e1.Key1==e2.Key1)
                               && (e1.Key2==e2.Key2)
                            select new { Data1 = e1.Data, Data2 = e2.Data };    

we can use a query similar to the following:

var query = from e1 in seq1
                        join e2 in seq2
                        on new { e1.Key1, e1.Key2 } equals new { e2.Key1, e2.Key2 }
                        select new { Data1 = e1.Data, Data2 = e2.Data };

where:

Para[] seq1 = new[] { new Para { Key1 = 1, Key2 = 2, Data = "777" }, 
new Para { Key1 = 2, Key2 = 3, Data = "888" }, new Para { Key1 = 3, Key2 = 4, Data = "999" } };
Para[] seq2 = new[] { new Para { Key1 = 1, Key2 = 2, Data = "777" }, 
new Para { Key1 = 2, Key2 = 3, Data = "888" }, new Para { Key1 = 3, Key2 = 5, Data = "999" } };

The Para type is defined in the following way:

    class Para
{
        public int Key1, Key2;
        public string Data;
}

Conclusion

We looked at some aspects of LINQ query optimization for MS SQL Server. Also, SQL Complete helped us a lot with searching through the query history and with formatting the scripts we used in the course of this article.

Unfortunately, even experienced .NET developers often forget that it’s necessary to understand what the instructions they use do in the background. Otherwise, they can become configurators and set a figurative time bomb in the future – both when the solution is scaled and when the environment’s external conditions are slightly changed.

Source files for the test – the project itself, table creation in the TEST database, and population of these tables with data can be found here.

Also, the Plans folder from this repository contains execution plans of queries with OR conditions.

Additionally, there is an excellent solution called dotConnect - it’s a line of data access components from Devart for various DBMS. Among others, dotConnect components support such ORM tools as Entity Framework Core and LinqConnect which allows you to work with LINQ to SQL classes.

References

  1. https://habr.com/ru/post/459716/
  2. https://stackoverflow.com/questions/56940996/how-to-optimally-combine-two-collections-in-linq-with-several-conditions-for-ms
  3. https://github.com/jobgemws/TestLINQ

History

  • 13th September, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Evgeniy Gribkov
Database Developer
Russian Federation Russian Federation
I am interested in everything related to the database and data.
Professionally engaged in MS SQL Server as a developer and administrator.
Soul lies on MS SQL Server

Comments and Discussions

 
-- There are no messages in this forum --
Article
Posted 13 Sep 2019

Tagged as

Stats

4.9K views
7 bookmarked