|
Introduction
This project was initially started by Rama who did almost all of the coding.
Personal affairs halted his progress and he handed it over to Nish who took it
up from where Rama had left off. Nish finished off the stuff and did some
statistical analysis on the results obtained. We wanted to get an idea of how different languages and tools compare with
each other in terms of performance. There are a variety of categories
where speed and performance can be measured, but the first thing which that came
to mind was
computation, and thus prime number generation was chosen as the criteria.
The next job was to decide how to implement something that can be
performance-compared in various languages. First the various common options had
to be chosen. We picked up the following ten different language options that are
available to the general Microsoft programmer.
The participants
- Visual C++ 7
- Visual Basic 6
- C#
- VB.NET
- Managed C++ compiled totally to IL
- Managed C++ with arithmetic intensive stuff in unmanaged code
- C# ngen'd
- VB.NET ngen'd
- Managed C++ compiled totally to IL and ngen'd
- Managed C++ with arithmetic intensive stuff in unmanaged code and ngen'd
The objective was to use a single test application to run and measure the
timings. Thus component DLLs were developed in all 10 language options. We ignored considering the overhead
due to COM in .NET calls as we did not expect it to be very significant.
The Code
We used a simple COM interface that, when given the number of primes to compute,
computed them. The IComputePrimes interface looks like this:-
interface IComputePrimes : IDispatch
{
HRESULT CalculatePrimes([in] int numPrimes);
};
This was generated by using the default options of the ATL object wizard. Any object
implementing this interface is expected to calculate and store as many prime
numbers as specified by numPrimes .
Now let's see how the code looks like for various cases.
The C++ code
STDMETHODIMP CComputePrimes::CalculatePrimes(int numPrimes)
{
if (m_rgPrimes != NULL)
delete [] m_rgPrimes;
m_rgPrimes = new int[numPrimes];
m_rgPrimes[0] = 2;
m_rgPrimes[1] = 3;
int i = 2;
int nextPrimeCandidate = 5;
while(i < numPrimes)
{
int maxNumToDivideWith = (int)sqrt(nextPrimeCandidate);
bool isPrime = true;
for(int j = 0;
(j < i) && (maxNumToDivideWith >= m_rgPrimes[j]);
j++)
{
if ((nextPrimeCandidate % m_rgPrimes[j]) == 0)
{
isPrime = false;
break;
}
}
if (isPrime)
m_rgPrimes[i++] = nextPrimeCandidate;
nextPrimeCandidate += 2;
}
return S_OK;
}
The prime numbers computed are stored in an integer array m_rgPrimes. The above
code tries to divide an odd number with all the prime numbers which are less
than its square root to decide whether the number is a prime or not. If yes it
stores it the array.
C# and MC++
The code for C#, Managed C++ is similar except that in the two cases with
Managed C++ where we mix native code into the managed code, the code is broken
into two separate functions as shown below.
void CalculatePrimes(int numPrimes)
{
primes = new int __gc[numPrimes];
int __pin* rgPrimes = &primes[0];
UnmanagedComputePrimes (rgPrimes, numPrimes);
}
The array is a managed array and we pin the array and call an unmanaged
function
that calculates the primes and fills the array.
VB/VB.NET Code
Private Sub IComputePrimes_CalculatePrimes(ByVal numPrimes As Long)
ReDim Primes(numPrimes)
Primes(1) = 2
Primes(2) = 3
Dim NextPrimeCandidate As Long
NextPrimeCandidate = 5
Dim i As Long
Dim j As Long
Dim MaxNumToDivideWith As Long
Dim IsPrime As Boolean
i = 3
Do While i <= numPrimes
MaxNumToDivideWith = Sqr(NextPrimeCandidate)
IsPrime = True
j = 1
Do While (j <= i) And (MaxNumToDivideWith >= Primes(j))
If NextPrimeCandidate Mod Primes(j) = 0 Then
IsPrime = False
Exit Do
End If
j = j + 1
Loop
If IsPrime Then
Primes(i) = NextPrimeCandidate
i = i + 1
End If
NextPrimeCandidate = NextPrimeCandidate + 2
Loop
End Sub
The VB.NET code looks similar with Sqr replaced with System.Math.Sqrt function.
The VB6 code is compiled with optimizations that will closely resemble the
generated C++ code like removing all integer overflow checks.
The test clients
All the cases are compiled into a DLL. All assemblies are registered for COM
interoperability. We have two test clients, a managed client and a native
client. The native client is coded in VC++ and uses the #import keyword.
__int64 ComputeAndGetResults(
ATLPrimesLib::IComputePrimesPtr spComputePrimes,
int numPrimes)
{
LARGE_INTEGER li1, li2;
li1.QuadPart = 0;
li2.QuadPart = 0;
QueryPerformanceCounter(&li1);
spComputePrimes->CalculatePrimes(numPrimes);
QueryPerformanceCounter(&li2);
return li2.QuadPart - li1.QuadPart;
}
int _tmain(int argc, _TCHAR* argv[])
{
try
{
ATLPrimesLib::IComputePrimesPtr spComputePrimes(argv[1]);
int numPrimes = atol(argv[2]);
LARGE_INTEGER f;
QueryPerformanceFrequency(&f);
std::cout << ComputeAndGetResults(spComputePrimes, numPrimes);
}
catch(_com_error& e)
{
}
return 0;
}
The managed client is written using C#.
try
{
Assembly assem = Assembly.Load(args[0]);
IComputePrimes primes =
(IComputePrimes)assem.CreateInstance(args[1]);
int numPrimes = Int32.Parse(args[2]);
long t1 = 0, t2 = 0;
QueryPerformanceCounter(ref t1);
primes.CalculatePrimes(numPrimes);
QueryPerformanceCounter(ref t2);
long freq = 0;
QueryPerformanceFrequency(ref freq);
Console.Write(t2 - t1);
}
catch(Exception e)
{
Console.Error.WriteLine(e.ToString());
}
Both the clients use the QueryPerformanceCounter API call as a
measure of the performance. The lesser the better. We have a program called
RunMultipleTests [C#] that calls both the clients for each of the 10 types of
DLLs. Take a look at the Main.cs file for how this is implemented. We called all
10 implementations once each to generate 10 primes, then 100, 1,000, 10,000,
100,000 and finally 1,000,000 (One million).
The results
I have selected a few of the generated results for discussion here. Smaller
numbers indicate higher performance.
| Language |
Primes |
Native Callee |
Managed Callee |
| ATLPrimes |
10 |
18,241 |
192,538 |
| VBPrime |
10 |
21,057 |
191,597 |
| CSharpPrimes |
10 |
1,201,258 |
1,003,710 |
| CSharpPrimes (ngen'd) |
10 |
99,017 |
20,357 |
| VBNetPrimes |
10 |
1,680,241 |
1,440,198 |
| VBNetPrimes (ngen'd) |
10 |
101,201 |
21,644 |
| MCPPPrimes1 |
10 |
1,443,943 |
1,117,279 |
| MCPPPrimes1 (ngen'd) |
10 |
107,362 |
29,574 |
| MCPPPrimes2 |
10 |
977,667 |
699,355 |
| MCPPPrimes2 (ngen'd) |
10 |
127,969 |
53,861 |
The above table shows the various results obtained when generating 10 primes.
As you can observe, the fastest performance was for the ATL DLL invoked from a
native C++ client. But it might surprise you to see that when the same DLL was called from a managed client through .NET COM interop, the performance has
fallen by almost 900%. So much for COM interop and it's supposed efficiency. It
hurt my ego a good deal to see that the VB DLL invoked from a native client
showed far superior performance to the Managed C++ DLL. Funnily the managed DLLs
don't show a drastic difference in performance between native invocation and
managed invocation. The exception is the MC++ DLL version 2 which is the
unmanaged-managed mixed version. All the managed DLLs show an amazing
performance increase when ngen'd. Perhaps it's time we all started taking ngen
more seriously. Very surprisingly, the ngen'd C# DLL was the second fastest of
all combinations. Curiously the VB.NET DLL was the slowest of them all. Here is
a graph of the above table.

But then 10 primes is too small a number to be making such observations.
Therefore we'll now move onto the results for 1000 primes. The excel sheets in
the download will list the full tables for those who are interested. And you can
always tweak the sample projects to give you other combinations and
permutations.
| Language |
Primes |
Native Callee |
Managed Callee |
| ATLPrimes |
1000 |
1,674,822 |
1,843,077 |
| VBPrime |
1000 |
1,659,063 |
1,830,014 |
| CSharpPrimes |
1000 |
2,951,717 |
2,665,328 |
| CSharpPrimes (ngen'd) |
1000 |
1,755,078 |
1,655,643 |
| VBNetPrimes |
1000 |
3,606,253 |
3,400,125 |
| VBNetPrimes (ngen'd) |
1000 |
2,108,643 |
1,954,464 |
| MCPPPrimes1 |
1000 |
3,110,415 |
2,742,913 |
| MCPPPrimes1 (ngen'd) |
1000 |
1,719,734 |
1,642,938 |
| MCPPPrimes2 |
1000 |
2,678,031 |
2,359,011 |
| MCPPPrimes2 (ngen'd) |
1000 |
1,748,994 |
1,742,121 |
Well, well, well! Suddenly the performance comparisons don't seem as
contrasting as they did when we generated 10 primes. Now the combination that
gave best performance is the fully managed MC++ DLL after ngen'ing. What is so
painful is to see that the VB6 DLL has out-performed the ATL DLL in both managed
and native invocation. Again VB.NET shows pathetic performance. But again you'll
see that ngen'ing has an amazing performance boost effect on the managed
assemblies. Now let's skip a few tables and go straight to the one million mark.
| Language |
Primes |
Native Callee |
Managed Callee |
| ATLPrimes |
1000000 |
19,389,792,910 |
19,400,345,304 |
| VBPrime |
1000000 |
19,334,822,911 |
19,340,626,315 |
| CSharpPrimes |
1000000 |
19,371,408,155 |
19,426,052,083 |
| CSharpPrimes (ngen'd) |
1000000 |
19,386,294,992 |
19,325,672,507 |
| VBNetPrimes |
1000000 |
19,870,238,968 |
19,980,902,937 |
| VBNetPrimes (ngen'd) |
1000000 |
20,007,201,165 |
19,900,407,405 |
| MCPPPrimes1 |
1000000 |
19,363,699,234 |
19,346,647,324 |
| MCPPPrimes1 (ngen'd) |
1000000 |
19,339,817,493 |
19,317,645,432 |
| MCPPPrimes2 |
1000000 |
19,450,368,014 |
19,325,875,844 |
| MCPPPrimes2 (ngen'd) |
1000000 |
19,345,122,911 |
19,429,232,591 |
Both Rama and Nish were pleasantly surprised to find that as we went to
higher and higher numbers for prime number generation, the stark contrasts in
performance started paling very noticeably till finally at the one million mark,
they all showed very similar performance. Again the ngen'd fully managed
MC++ DLL was the best and the VB.NET DLL was the worst. What was most curious
was that ngen'ing actually had a negative impact on the VB.NET DLL. And here is
a graphical representation.

Here is another graph that shows the impact ngen has on managed assemblies

You'll notice that ngen has maximum impact on VB.NET programs and as you'd
guess least impact on MC++ code that has native code blocks. You'll also notice
that the impact of ngen seems to decrease as we generate a higher number of
primes. This is made very clear in the following graph

So far we have only seen cases where the methods were called once. Thus the
managed versions suffered because of JIT compiling overheads. So we did multiple
calls to try and see if the managed versions got any faster after the first
call. So we looped the calls thrice. Here are some sample test results. Don't be
surprised by the difference in results with the tables above. The first set of
tests were run on a Dual P-III 550 MHz with 384 Mb RAM. So numbers are higher
for the first set of results because the performance counter frequency is quite
high for a dual processor machine. The multiple-method-call tests were all run
on Single P-III 800 MHz with 384 Mb RAM. Obviously the performance frequency is
lower and thus the numbers are also smaller. But you'll notice that the ratios
remain more or less the same.
| Language |
Primes |
Native Callee
#1, #2 & #3 |
Managed Callee
#1, #2 & #3 |
| CSharpPrimes |
10 |
5973 |
35 |
25 |
4848 |
56 |
46 |
| CSharpPrimes (ngen'd) |
10 |
476 |
32 |
276 |
95 |
60 |
45 |
| VBNetPrimes |
10 |
7663 |
38 |
29 |
8144 |
59 |
50 |
| VBNetPrimes (ngen'd) |
10 |
489 |
35 |
29 |
101 |
63 |
51 |
| MCPPPrimes1 |
10 |
6270 |
34 |
26 |
5383 |
57 |
46 |
| MCPPPrimes1 (ngen'd) |
10 |
499 |
31 |
24 |
127 |
56 |
46 |
| MCPPPrimes2 |
10 |
4466 |
38 |
25 |
3646 |
61 |
47 |
| MCPPPrimes2 (ngen'd) |
10 |
624 |
31 |
25 |
247 |
65 |
47 |
You'd notice that there is a amazing increase in performance for the 2nd call
and further calls. The most noticeable performance improvement is for the non-ngen'd DLLs.
The ngen'd C# DLL shows a slight anomaly for it's 3rd run, but this might have
been due to some OS activity coinciding with that exact moment. It's nothing but
an anomaly, so you may safely ignore it. Thus, whether you ngen it or not, from
the 2nd run onwards your methods will be nearly as fast as native calls, because
there is no JIT overhead. But it will not be as fast obviously because of other
overheads like garbage collection. You'll also notice that the 3rd call has
actually improved over the 2nd call, but this improvement across calls drops
sharply as we increase the call loop count. Now let's take the results for a
larger number of primes.
| Language |
Primes |
Native Callee
#1, #2 & #3 |
Managed Callee
#1, #2 & #3 |
| CSharpPrimes |
10000 |
165346 |
162135 |
158838 |
159857 |
157004 |
156279 |
| CSharpPrimes (ngen'd) |
10000 |
155593 |
154611 |
156586 |
157266 |
156629 |
154440 |
| VBNetPrimes |
10000 |
180720 |
172494 |
173198 |
175535 |
171634 |
170705 |
| VBNetPrimes (ngen'd) |
10000 |
172432 |
173577 |
172076 |
173416 |
175305 |
173921 |
| MCPPPrimes1 |
10000 |
165775 |
159783 |
160712 |
161040 |
158640 |
157350 |
| MCPPPrimes1 (ngen'd) |
10000 |
155954 |
164162 |
159695 |
155283 |
159554 |
155928 |
| MCPPPrimes2 |
10000 |
160007 |
154570 |
154990 |
171823 |
158746 |
156686 |
| MCPPPrimes2 (ngen'd) |
10000 |
156243 |
153972 |
154144 |
154966 |
157720 |
167443 |
Ah, now the performance improvements of ngen are not as obvious. This again
confirms the fact that over the long run, the bottlenecks of JIT fades off
slowly and finally just about disappears.
Some conclusions
- Using ngen has a tremendous performance improvement on your managed code.
This is specifically higher when called from a managed client than when
invoked from a native C++ client.
- Managed/Unmanaged transitions are inefficient. And the unmanaged to managed transitions
are much slower than the managed to unmanaged transitions. Thus wherever
possible it's best to avoid managed/unmanaged transitions.
- There is a marked improvement in performance of managed code if they are
repeatedly invoked, because the JITing is done only the first time.
- As we increase the number of primes the performance differences between
the various languages starts to reduce, which again underlines the fact that
without the JIT overhead managed code is just as good as native code.
- Of all the .NET compilers, the VB.NET compiler seems to produce the
slowest code. We think this is because VB.NET checks for overflows in all
arithmetic operations (verified using ILDasm)
- The C# compiler seems to be markedly better than the MC++ compiler (pure
managed code).
- Using ngen has most impact on VB.NET assemblies and least impact on MC++
assemblies
- Mixing unmanaged and managed code with C++ is far more efficient than pure
MC++. In fact pure MC++ is much slower than C# for fully managed projects. Thus unless you plan to
integrate MFC or ATL, C# is the better choice over MC++.
Updates and fixes
- Aug 10 2002 - A major goof-up was fixed. In the looped method tests, we
had looped at the wrong place. Instead of looping the method we actually
looped the execution of the client process. This has been fixed, and the
tables and the excel sheets have been updated.
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 93 (Total in Forum: 93) (Refresh) | FirstPrevNext |
|
 |
|
|
Hello there,
I am currently working in .net. Now i need to know that in order to build a server/client architecture which one is more suitable Vc++ or .net. There is a server in our company that has been build over vc++, now we are looking for any chances of shifting it to .net. BTW speed of delievering messages and other networking efficiency issues are very important. So, cud u give me any advice on it, or tell me performance comparison of vc++ or .net.
Thanks in advance.. Cheers.......
Bye
Nothing Is Impossible In Life
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
In Longhorn the API is not in C but in DONNET aware langauge.
Does anyone know what will be the result in Longhorn?
Thanks in advance
Oren.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Dear Anonymous:
ngen.exe only creates an 'pre-compiled' image in cache, not an .exe that you can then run on a computer without the framework installed. You still need the framework to run it, it's just saving you the JITing time.
Also dont forget to read my first CP article: http://www.codeproject.com/dotnet/dnlp.asp[^]
, Keep Smiling.
Rai Umair
What is now proved, was once only imagened...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I was rather disappointed.
So we're looking at performance of languages. The signal is the execution time of the test code. The noise is the overhead from COM interop, etc.
For small test sizes, i.e. small number of primes, the noise outweighs the signal. This makes the tests completely useless at this level.
For the large test, the difference between tests is no more than 1%. We don't know how the overhead affects the results on this scale. Once again, the tests are pretty useless.
There's one small exception: the VB.Net code is 3-4% slower. But this may be due to language-specific constructs. The use of the ReDim statement comes to mind. ILDASM is the tool of choice here: see what code the compiler generates.
A more valid test would be to have the entire test code, including the timing code, written in the language. That will minimize overhead.
Jeffrey
Everything should be as simple as possible, but not simpler. -- Albert Einstein http://www.extremeoptimization.com/
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I upgraded this to vs.net 2003 and the ngen numbers were almost identical in the extended tests. VB6 was the fastest in the extended tests.
ATLPrimes 10 33 738 VBPrime 10 111 2644 CSharpPrimes 10 4028 3162 CSharpPrimes (ngen'd) 10 10492 3225 VBNetPrimes 10 4495 3804 VBNetPrimes (ngen'd) 10 4633 5208 MCPPPrimes1 10 7049 4456 MCPPPrimes1 (ngen'd) 10 5303 4498 MCPPPrimes2 10 8389 9353 MCPPPrimes2 (ngen'd) 10 8520 7751
ATLPrimes 1000 7321 7995 VBPrime 1000 5067 6051 CSharpPrimes 1000 10346 9541 CSharpPrimes (ngen'd) 1000 10185 9504 VBNetPrimes 1000 79723 25387 VBNetPrimes (ngen'd) 1000 25531 25144 MCPPPrimes1 1000 13642 11275 MCPPPrimes1 (ngen'd) 1000 12043 12935 MCPPPrimes2 1000 31226 29267 MCPPPrimes2 (ngen'd) 1000 32183 30623
ATLPrimes 100000 3664678 3751235 VBPrime 100000 2441960 3006081 CSharpPrimes 100000 3320130 2919183 CSharpPrimes (ngen'd) 100000 3364131 3105973 VBNetPrimes 100000 5443152 5907290 VBNetPrimes (ngen'd) 100000 5262887 5847501 MCPPPrimes1 100000 2802521 2794613 MCPPPrimes1 (ngen'd) 100000 2952759 2825932 MCPPPrimes2 100000 5890069 5385754 MCPPPrimes2 (ngen'd) 100000 5928540 5482403
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I think the test was too simplistic to draw any conclusions for 'real world' applications.
However, it does show that if you're doing serious number crunching, there is very little difference with large working sets, and an insignificant difference with a small working sets.
One big advantage of managed code is that memory allocation is MUCH faster. In C++, new/delete are very expensive operations. With .NET code new is almost as efficient as allocating stack space and garbage collecting lots of objects at once requires less time than deleting lots of objects manually one by one.
In most applications, I think you'll find the speed of dynamic memory allocation/deallocation a more important benchmark .
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Has anyone bothered to look at the source code ? Not only does the author have the VB.NET project set to incremental build, but also has option strict off, optimizations off, and uses non short circuiting operators (And instead of AndAlso) inside the loop test, whereas for the C# project the author has set the complete opposite. For the C# he has incremental build off, optimizations on, and uses short circuiting operator (&&)
I fixed the code, changed the order of the tests, and ran multiple tests in he same run, that is 10, 100, 1000, 10000,10, 100, 1000, 10000,10, 100, 1000, 10000, etc all in the one run. Why you may ask ? Well the calling code may get optimized slightly differently or behave differently as new libraries are being loaded and as the JIT occurs. Most importantly running repeated tests in the one run simulates more what we are likely to see when running web services or asp.net applications.
The results interestingly enough were vastly different from the author's claims. In fact, the VB.NET code out performed the C# on many occasions, but likewise so did the C# code. These variations/fluctuations are most likely due to garbage collection.
Furthermore to put this code all into perspective, it's important to note that there is significant COM overhead. The difference can be seen if the managed code is called directly rather than through an interface.
Finally, one important aspect seems to be over looked. The author is testing only specific parts of the language functionality. In particular he tests a couple of math functions. Obviously if code does not perform safety tests then lower level code will perform bests on these kinds of tests as these are aimed at series of processor instruction codes that are often built in these days. A more realistic test would be something that most business do everyday such as string processing, or perhaps the rendering of a web page etc.
|
| Sign In·View Thread·PermaLink | 5.00/5 (2 votes) |
|
|
|
 |
|
|
Look at my comments in one of the messages below about the VB.NET project. The projects available in the web site have default settings and also make sure that you are building the release builds and not the debug builds.
The mistakes made in this API make me realise that Microsoft has become big enough that it can shelter morons. If anyone working for me wrote anything this bad and tried to release it, I would kill them and display the body as a warning to the rest of the team. - Christian Graus about C# - GDI+
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I was indeed testing release builds and I was also testing outside of the VS.NET IDE as the IDE hooks into the process regardless of whether debug or release.
But as to your claim of default settings, that sir, is totally incorrect ! I am beginning to think this is more a case of deceipt than just bad coding.
|
| Sign In·View Thread·PermaLink | 5.00/5 (2 votes) |
|
|
|
 |
|
|
 |
|
|
An interesting article, to be sure, but it's only a first step in doing performance comparisons. A few things things leap to my attention, and are not particularly suprising:
- For short-running programs, startup time of the runtime environment
dominates efficiency of the code optimizer in determining total execution time.
- The efficiency of long-running programs is very similar across all
environments. This shouldn't be surprising since ultimately, there's only 3 code generators being considered: VC native, .NET JIT, and ngen, and all were written by the same organization (and presumably make use of very similar technologies and algorithms).
- There's an extra startup cost in loading a managed assemlby into an
unmanged client.
- The cost of loading an ngen'd assembly into a managed client is
substantially less than that of loading an IL assembly.
One thing that would be interesting to add to the tables is a computation of "time per additional iteration". So, in the table of "1000 prime" results, compute the value (time(1000)-time(10))/990 for each combination of client/server. Another helpful change would be to translate the times into "clock time" instead of reporting QueryPerformanceCounter() times. I see in the sample code that QueryPerformanceFrequency was called, but the result never used.
I see that some tests were run on a single CPU machine, and others on a dual. I'd like to see all testing done on a dual. It would also be interesting to modify the clients to set their thread affinity to force all execution onto a single CPU. Even better would be to run on Windows 2000 DataCenter Server on a dedicated processor, but not many people have access to that environment... Either of these changes should reduce the noise in the measurements. Many of these results are within miliseconds of each other, and little things like bumping the mouse while the test runs can easily cause tens-of-miliseconds timing variations.
In the multiple calls tables, I think the authors are missing an opportunity: we know a-priori that the all-native combination _should_ be unaffected by multiple runs, yet in the tables, it appears to be. I'd like to see a higher number of repetitions (at least 10), and a calculation of the mean, variance, and a linear regression of the points for each client/server combination. Again, everthing should be reported in real-world times. Any differences of less than a few miliseconds should be considered to be noise, unless a very high number of repetitions has been run under very controlled conditions (no network, no mouse movement, no other processes running, etc).
While this article and the work that it represents are a good first step, I do have concerns with the results (and would be very hesitant to use these results to justify any kind of business decision):
- The algorithm is too simple.
- It makes no use of floating point.
- It does virtual no memory allocation
- There's no use of value-types
- The algorithm involves only built-int types which are highly optimized by
the hardware (e.g. int).
- Too little attention to detail with regard to identifying and removing
noise from the measurements.
I think the second conclusion of the article is over-stated: manged-unmanaged transitions have a cost, which can be significant, especially in short-lived programs. In the long-running samples, there was less than 3% difference in execution speed between a managed an unmanged client.
Keep up the profiling!
-cd
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Thanks Carl, for taking time to give some very useful feedback In fact your post had nearly enough content be a sort of article on it's own 
Carl Daniel wrote: I see in the sample code that QueryPerformanceFrequency was called, but the result never used.
Er, the intention was there to convert to clock time but finally forgot to do it or I think it was because the figures were not as impressive, for example - between 0.00056 seconds and 0.00451 seconds, even though there is a 10-times difference people won't see that easily. They'll just see both as very small numbers, as people are normally not used to comparing small fractions.
Carl Daniel wrote: I think the second conclusion of the article is over-stated: manged-unmanaged transitions have a cost, which can be significant, especially in short-lived programs. In the long-running samples, there was less than 3% difference in execution speed between a managed an unmanged client.
Perhaps true, but one solid conclusion we made was that transitions almost always brought down speed - as in marshaling! In the long run this is not very visible because the transitions are required only at the beginning. Thus for an ASP.NET app that will be called multiple times this may be okay. But a regularly used desktop app might have an issue as it is freshly run each time.
Anyway thanks a lot once again, Regards, Nish
Author of the romantic comedy
Summer Love and Some more Cricket [New Win]
Review by Shog9 Click here for review[NW]
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
What flags were used for the C++ tests? Exact cmd line please.
Which test is Managed C++ using /CLR but *no* code changes (i.e., using all unmanaged types as-is)?
Why isn't a System::GC::Collect() being performed as closure to these tests? We all know GC is heavily biased on the back during cleanup (including thread suspensions and separate thread callback Finalizers) and that the front is cheap. I know this particular test isn't really germaine in that sense but I have seen plenty that are. Omiting the GC collection phase is pretty biased.
And when I do see GC collection being omitted in various tests, I also invariably don't see anything other than the default allocator used in C++ examples - like any decent C++ programmer wouldn't use a special arena or fixed allocator for perf-sensitive tests. Go figure. Again, in this test that isn't necessary but many do need it to be fair.
Since all the languages above are ultimately written *in* C++, then technically they are all examples *of* C++. So you can't really state that any of them are faster than C+ since they *are* C++.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I did not see the source link for some reason earlier, now I have them.
Why use /Od running a *perf test* in C++? I run benchmarks with /O2 /GL, not /Od. I am trying to understand why someone would do this and call it a test of performance. You favor /ZI for Edit and Continue over optimizing the code, and throw in /EHsc and RTC1 too?? Please use /O2 /GL and get rid of the other flags then run the test for C++ again.
Before someone points out that C# also has optimization turned off in its build, I think that not optimizing C++ is costing much more than not optimizing C# here. Besides, just turn them all on and let's see what happens then. It can't hurt to see right?
If you are going to make statements about comparing performance of a language, please don't tie one (or both) of its hands behind its back and call it a fair fight.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Anonymous wrote: Why use /Od running a *perf test* in C++? Please double check.
We are testing the release build and not debug builds. In debug builds optimizations are turned off. In Release builds C++ project is optimized for maximum speed.
Anonymous wrote: You favor /ZI for Edit and Continue over optimizing the code,
You are using Debug builds. There is another config Release use that. We have used Release mode for all our perf studies.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Did having OptionStrict = "Off" handicap VB .NET's performance?
Scott Hutchinson s.c.o.t.t.h.u.t.c.h.i.n.s.o.n@usa.net (to contact me, remove all dots left of @)
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
A great article! Thanks very much.
Have you thought about extending it to non MS languages like Delphi, C++ Builder etc? I've been looking around for performance comparisions between Delphi and VC for some time, but can't find anything out there...
Dylan Kenneally London, UK
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Since I'm working on some CPU intense simulation, and considering writing the GUI with c# I was interested specially in the time it takes to go in-and-out from managed into unmanaged code. I was interested not only in the COM way to do it but also importing from standard DLL using the [DllImport] attribute, and using the MC++ wrappers as described here .
Besides, it seems that the differences of net. calculation times between C# and native C++ (without initialization JIT compilation, etc...) are about 5% (for this kind of task).
So, first I've add added number of iterations parameter, thus:
QueryPerformanceCounter(ref t1); for(int i = 0;i < iter;++i) primes.CalculatePrimes(numPrimes); QueryPerformanceCounter(ref t2);
Now I could preform tests for assessing the cost of menage-to-unmanaged call.
Baseline (the usual tests + second parameter is iterations): ATLPrimes, NativeClient Primes(10,1)= | | | | | |