Introduction
This article will discuss performance differences between a LINQ loop and a regular For loop, as a way to practice using Visual Studio 2008, Linq and unit tests for the first time.
Background
LINQ (Language INtegrated Query) is Microsoft's new .NET addition to the language and allows formulating queries in an SQL-like syntax. It's particularly useful when traversing data sets, XML DOM trees and collections.
I first came across Visual Studio 2008, and LINQ in particular, during this year's Tech Ed at Orlando.
A nice developer from Microsoft described and demoed it for me (at that time, it was only available for VB.NET, but a C# version came out with beta 2). That developer insisted that LINQ is not just for data sets or complex objects, but can also be used for simple loops.
I decided to put his theory to the test, as a way to learn the new technology: I wrote a simple program that searches for odd numbers in an array, and compared the time it took a regular loop to the time it took a Linq loop, to come up with the right answers.
While this article was written several months ago, I waited for the RTM version of VS 2008 and .NET 3.5 to arrive, before publishing it.
Along the way, I got to learn LINQ more deeply and using VS 2008 unit testing capabilities.
As the project grew, I've added a third loop (ForEach) to the mix. I then decided to output it all to a CSV file and analyze the results in Excel.
The Logic
- The program allocates an array, with n elements and fills it with numbers
- It then calls each of a
GetAverage on each of the 3 functions
GetAverage calls each function a 1000 times (configurable)
- It measures the time it takes a function to go through n elements and calculates an average.
Measurements are derived using Daniel Strigl's High Performance System Timer
- Averages to n elements are displayed (or outputted to a file) for each function
- the entire code runs 5 times, to ensure average consistency
Using the Code
This is a bare-bones application. It runs as a console application and has no UI.
The only configurable parts in it are:
numElements - how many elements in the array
numIterations - how many times is each algorithm called, to calculate an average
- Application output can go to a file or to the standard output (screen) - just comment the right lines
The Harness
Here's the Main function:
static void Main(string[] args)
{
StreamWriter file = new StreamWriter("results.txt");
file.WriteLine("Elements\tFor loop\tForEach Loop\tLinq Loop");
for (int i = 0; i < 5; i++)
{
int numElements = 1000 * (int)Math.Pow(10, i);
FillArray(numElements);
file.WriteLine("{0:#,#}\t{1:0.0000000000}\t{2:0.0000000000}\t{3:0.0000000000}",
numElements, GetAverage(GetOdd), GetAverage(GetOddForEach), GetAverage(GetOddLinq));
numElements, GetAverage(GetOdd), GetAverage(GetOddForEach), GetAverage(GetOddLinq));
}
file.Close();
}
As you can see, all it does is call the GetAverage function, passing the algorithm function as a parameter.
The GetAverage looks like this:
private static double GetAverage(func f)
{
double averageDuration = 0.0;
for (int i = 0; i < numIterations; i++)
{
pt.Start();
int odd = f();
pt.Stop();
averageDuration += pt.Duration;
}
averageDuration /= numIterations;
return averageDuration;
}
As you can see, not too complicated: it starts a timer, calls function f() stops the timer and accumulates the time. It does so numIterations times and returns the average. I really liked the idea of submitting a function name as a parameter, as it abstracted the design and will let me build on it in the future.
The Algorithms
Essentially, all 3 functions use simple O(n) search algorithms: The GetOdd is the most straightforward:
private static int GetOdd()
{
int counter = 0;
for(int n = 0; n < theArray.Length; n++)
{
if (theArray[n] % 2 == 1)
{
counter++;
}
}
return counter;
}
The GetOddForEach:
private static int GetOddForEach()
{
int counter = 0;
foreach (int n in theArray)
{
if (n % 2 == 1)
{
counter++;
}
}
return counter;
}
and finally, the GetOddLinq using the new Linq syntax:
private static int GetOddLinq()
{
var odd = from n in theArray
where n % 2 == 1
select n;
return odd.Count();
}
You first notice the new keyword var (was someone thinking of JavaScript while designing this?). It defines a new IEnumerable collection that will contain the results of the LINQ query. The query itself looks a bit like a reversed SQL query (select is in the end), but it's still readable.
For more on Linq's syntax and samples, try the official LINQ project page.
The Results
I've run this program on several computers and VMs. I've tried it on Windows XP, Vista and 2008 RC1. I tried running it on a busy machine, or on a completely vacant machine. Finally, I've tested debug and release versions. The numbers may change, but the trend remains the same:
Measurements are in seconds. Column E shows the percentage of time added by Linq compared to For: Fi = (Di - Bi)/Di.
Of course, once you have the raw data, you can analyze it however you want, such as generate a graph:
Note: as mentioned results have been pretty consistent, and Linq had 75-85% overhead, in almost every test. But in debug version, LINQ took even longer to complete the task, while For and ForEach remained essentially the same.
My only guess is that LINQ has some instrumentation built into it, to allow for easier debugging � thus it's slower in debug builds.
Unit Testing
A huge chunk of the Tech Ed sessions was dedicated to testing and in particular, how easy it is to add unit tests in VS 2008. And indeed, it didn't take long. Right click anywhere in the source and select "Create Unit Tests...". A wizard will take you through selecting the functions you want to test in your project and would eventually create a test project and add it to the solution.
The test projects comes ready with the right references and a set of accessors � allowing the unit test functions access to all members of the original class � even the private ones.
So how do you test this code? Here's the unit test for the function that creates the array:
[TestMethod()]
[DeploymentItem("LinqTest.exe")]
public void FillArrayTest()
{
int n = 10;
Program_Accessor.FillArray(n);
Assert.AreEqual(n, Program_Accessor.theArray.Length);
}
Pretty simple, isn't it? Essentially, you are using the Program_Accessor to gain access to the LinqTest.exe assembly. Upon calling the FillArray function for n elements, you assert that the size of the array should now be n.
Now, let's test one of the search functions (the tests for all are the same � a unit test does not care about the internal logic of the function, just about the results).
[TestMethod()]
[DeploymentItem("LinqTest.exe")]
public void GetOddLinqTest()
{
int expected = 1;
int actual;
Program_Accessor.FillArray(2);
actual = Program_Accessor.GetOddLinq();
Assert.AreEqual(expected, actual);
}
Here I cheated. Knowing that my array will be filled with consecutive numbers, I know that any 2 adjacent cells I pick will contain 1 odd number. So, we build a 2 cell array, fill it and compare the number of odd numbers returned from GetOddLinqTest with the expected result. In a variant of the program, where the array is filled with random numbers, you'd have to change this function, to get the right expected.
Note: random numbers will not change the measurement results, as we always have to scan the entire array.
Now, run all the unit tests prior to building the solution (or click CTRL+R,A) and, hopefully, you'll see all green:
History
Version 1.00 released on 12/5/2007
Version 1.01 released on 12/14/2007
Update
Following the suggestions received in the comments, 2 corrections were implemented, to improve measurment accuracy:
- Per Dennis Dollfus's suggestion, the LINQ function now looks like this:
private static int GetOddLinq()
{
int oddNumbers = theArray.Count(n => n % 2 == 1);
return oddNumbers;
}
- Per kckn4fun's suggestion, results are accumulated into a
StringBuilder and only written to file/console in the end. The new Main function looks like this:
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine("Elements\tFor loop\tForEach Loop\tLinq Loop");
for (int i = 0; i < 5; i++)
{
int numElements = 1000 * (int)Math.Pow(10, i);
FillArray(numElements);
sb.AppendLine(string.Format(
"{0:#,#}\t{1:0.0000000000}\t{2:0.0000000000}\t{3:0.0000000000}",
numElements, GetAverage(GetOdd), GetAverage(GetOddForEach),
GetAverage(GetOddLinq)));
}
Console.ReadLine();
WriteFile(sb.ToString());
}
The new results look like this:
As you can see, performance is slightly better � but the tren remains.
Final note
This program is, by no means, a thorough analysis of LINQ's general performance. I'm sure its behavior in traversing complex data sets and XML DOM trees is much better. I never set out to prove anything, just play a little bit with the new environment.
Feel free to use the program and its results however you choose. The way I designed it, it's easier to plug in more complex logic and still get measurements.
| You must Sign In to use this message board. |
|
|
 |
|
 |
The Mod operator is expensive and unnecessary for detecting odd numbers. Although this won't necessarily make Linq faster, it should improve performance overall:
IsOdd(n) = (n & 1) == 1;
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Thanks for the tuning . It throws me back to my C days when we were racing each other to write statements that translate into the fewest number of Assembler lines. I would like to believe that the CLR optimizer will work its way around this statement.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Guy Vider wrote: I would like to believe that the CLR optimizer will work its way around this statement.
Wishful thinking! I just whipped together an example to find all odds in the range 1..10000000000.
Results (VS 2008 SP 1, Release mode, .NET 3.5 SP1):
f(n) = (n % 2) == 1 221.783 seconds
f(n) = (n & 1) == 1 29.519 seconds
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Out of curiosity, why not use the Stopwatch class available since .NET 2.0?
System.Diagnostics.Stopwatch[^]
"The Stopwatch measures elapsed time by counting timer ticks in the underlying timer mechanism. If the installed hardware and operating system support a high-resolution performance counter, then the Stopwatch class uses that counter to measure elapsed time. Otherwise, the Stopwatch class uses the system timer to measure elapsed time. Use the Frequency and IsHighResolution fields to determine the precision and resolution of the Stopwatch timing implementation.
The Stopwatch class assists the manipulation of timing-related performance counters within managed code. Specifically, the Frequency field and GetTimestamp method can be used in place of the unmanaged Win32 APIs QueryPerformanceFrequency and QueryPerformanceCounter."
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi James, Probably because I never heard of it until now Thanks for alerting me to this class. I'll start using it from now on. Regards, Guy
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi Guy,
After testing your code a bit, I thought to myself that there is a quicker way to test if a number is odd or even by doing a bitwise AND against 1... and I was right, except when it comes to LINQ! Have a look at the following results:
The Mod 2 results:
Elements For loop ForEach Loop Linq Loop 1,000 0.0000084933 0.0000091430 0.0000199097 10,000 0.0000718602 0.0000772695 0.0001637119 100,000 0.0007081864 0.0007624654 0.0015866832 1,000,000 0.0125317390 0.0077455719 0.0157685820 10,000,000 0.0723991492 0.0779774987 0.1674746024
The Bitwise AND results:
Elements For loop ForEach Loop Linq Loop 1,000 0.0000081085 0.0000085360 0.0000206574 10,000 0.0000586321 0.0000694746 0.0001666217 100,000 0.0005786034 0.0006927755 0.0017191306 1,000,000 0.0058726080 0.0069936906 0.0170594472 10,000,000 0.0590749629 0.0705102174 0.1717112192
As we can see, both the For and ForEach loops are quicker with the bitwise comparison, but the LINQ Loop is actually slower! I'd be very interested to know why... so if any LINQ pros are reading this, what's the answer?
Nice article, btw, and I'd be interested in seeing an update when Parallel LINQ hits the mainstream.
Cheers,
Martin.
"On one of my cards it said I had to find temperatures lower than -8. The numbers I uncovered were -6 and -7 so I thought I had won, and so did the woman in the shop. But when she scanned the card the machine said I hadn't.
"I phoned Camelot and they fobbed me off with some story that -6 is higher - not lower - than -8 but I'm not having it." -Tina Farrell, a 23 year old thicky from Levenshulme, Manchester.
modified on Sunday, December 23, 2007 5:11:53 AM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi Martin, Just out of curiosity, did you use the older or newer LINQ implementation to arrive at these results?
I find it interesting that there is a difference between bitwise AND and a MOD 2 (which essentially amounts to a bitwise shift right, if I remember my binary).
I Would probably run this several more times and if a trend emerges, would dig deeper into it (imagine: "get better performance for long calculations by breaking them down to their binary components!") - oh wait, isn't that what the compiler's optimizer supposed to do for us automatically?
Anyway, good job. I wouldn't have thought going there 
Guy
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Curiouser and curiouser!
I tested with the original LINQ implementation, and that proved to be much slower than the modified version!
I'm going to have a look at the IL the compiler generates to see if that holds any clues...
"On one of my cards it said I had to find temperatures lower than -8. The numbers I uncovered were -6 and -7 so I thought I had won, and so did the woman in the shop. But when she scanned the card the machine said I hadn't.
"I phoned Camelot and they fobbed me off with some story that -6 is higher - not lower - than -8 but I'm not having it." -Tina Farrell, a 23 year old thicky from Levenshulme, Manchester.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hi people, After reviewing all the comments, I've revised 2 of the functions and ran the measurements again. The revised code and results can be seen above.
I'd like to thank kckn4fun for reminding me to use StringBuilder rather than keep a file open, and Dennis Dollfus, for suggesting a more efficient LINQ syntax. Dennis, if you could elaborate in a comment about the syntax differences, and delegates usage in LINQ, I'm sure we would all be thankful.
Cheers, Guy
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
It'll probably catch up in performance in about two more versions.
Linq to Sql though is a real beast. You can get decent performance out of it, but you'll have to play a game of Design Twister before it yields. Even then, it will be slower.
The real bottom line is determined by the problem you're trying to solve. Keep using what you know works, and do what you've done here... TEST, TEST, TEST.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Thanks for the spot of optimism Dewey! I will definitely continue to test. My new motto is "Red, Green, Refactor"!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
 |
In your loop, you are running a loop that is performing I/O at every loop iteration. This is putting an uncontrollable variable into your test times, as well as lengthening the time that it takes these algorithms to run. Furthermore, since the O/S controls I/O to disk, you may be getting inconsistent test times.
I recommend you rewrite the test to write to a StringBuilder instead, then dump the buffer to disk. That'll probably yield higher accuracy and better performance overall.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
I see your point, but slightly disagree: The information is written to the disk (or console) AFTER the averages are calculated. The measurements inside GetAverage are not suffering from any "noise" - as far as I can see.
Thanks for the comment!
Guy
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
 |
Let's take a look at this!
int oddNumbers = numbers.Count(n => n % 2 == 1);
vs
int oddNumbers = (from n in numbers where n % 2 == 1 select n).Count();
- 1 -
int oddNumbers = numbers.Count(n => n % 2 == 1);
==
int oddNumbers = LINQCore.Count(numbers, n => n % 2 == 1);
// where:
public static class LINQCore { public static int Count< T >(this IEnumerable< T > items, Func< T , bool > predicate) { int c = 0; foreach (T item in items) { if (predicate(item)) c++; } return c; } }
Result: one method call, (n) delegate call.
- 2 -
int oddNumbers = (from n in numbers where n % 2 == 1 select n).Count();
==
int oddNumbers = numbers.Where(n => n % 2 == 1).Count();
==
int oddNumbers = LINQCore.Count(LINQCore.Where(numbers, n => n % 2 == 1));
// where
public static class LINQCore { public static int Count< T >(this IEnumerable< T > items) { int c = 0; foreach (T item in items) c++; return c; }
public static IEnumerable< T > Where< T >(this IEnumerable< T > items, Func< T, bool > predicate) { foreach (T item in items) { if (predicate(item)) yield return item; } } }
Result: two method call, (n) yield construct, (n) return value, (n) delegate call.
It's all about delegates and yields.
modified on Saturday, December 15, 2007 4:23:48 AM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Very well - I'll post a version with the suggested construct. But do we really expect LINQ to beat For with this?
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
If you will use same construct (yields, delegates per query) it might be equals. LINQ to Objects and LINQ to XML is just about (extension) method calls. Some very simple compiler tricks only.
LINQ to SQL and other IQueryable implementations is mutch complex, because those relies on expression trees. IEnumerable queries are just method calls.
modified on Saturday, January 19, 2008 2:26:10 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hello, Please note that functions GetOddForEach and GetOddLINQ are doing slightly different things. Function
private static int GetOddForEach() { int counter = 0; foreach (int n in theArray) { if (n % 2 == 1) { counter++; } } return counter; } only enumerate array elements and increase counter, but
private static int GetOddLINQ() { var odd = from n in theArray where n % 2 == 1 select n; return odd.Count(); } also may allocate memory for odd elements (but I don't know how it was implemented in LINQ). Memory allocation process may require additional time.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
The logic is different-- it can be fixed though. Still, that difference may not have a significant impact on performance difference.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Thanks for the reply. I admit to not studying LINQ in depth - I just looked for the first interesting implementation I could test with VS 2008. The math may be off - but I believe the trend remains.
Again - this is by no means a conclusive LINQ effectiveness test!
Thanks for the comments!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
I Agree, The LINQ Select is much faster that For loop, but the Code odd.Count(); is the culprit in degrading LINQ performance function, it is time consuming as it needs to find the records, in the for loop the counter gets incremented in each iteration. Remove that odd.Count(); and have a try again.
LINQ Is much faster for selecting comparing to for/foreach.
Cheers !!! I love LINQ...
Aj
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
LINQ to Objects API uses yield return construct on select. You should try something like this to get fair results:
private static IEnumerable GetOddForEach() { foreach (int n in theArray) { if (n % 2 == 1) { yield return n; } } }
private static int GetOddForEachCount() { int counter = 0; foreach (int item in GetOddForEach()) { counter++; } return counter; }
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|