Inner Product Experiment: C# vs. C/C++






3.66/5 (12 votes)
The article demonstrating speed of inner product operation performed with shorts, ints, longs, floats, doubles and decimals in C# compared to C/C++
Introduction
The inner product (or dot product, scalar product) operation is the major one in digital signal processing field. It is used everywhere, Fourier (FFT, DCT), wavelet-analysis, filtering operations and so on. After written the similar article for the inner product in C/C++ Inner Product Experiment: CPU, FPU vs. SSE* I was thinking how the same code written in C# will perform. I repeated the inner product operations using C# types: shorts
, ints
, longs
, floats
, doubles
and decimals
.
Background
Inner Product Experiment: CPU, FPU vs. SSE*
Using the code
Just run the inner.exe
providing as an argument the size of vector you want to convolve with. Make sure you placed timer.dll
in the same directory with the executable. It provides tic()
and toc()
functions implementing precision time counter in milliseconds. I use the dll in PerformanceCounter
static class in functions PerformanceCounter.Tic()
and PerformanceCounter.Toc()
.
static public class PerformanceCounter
{
//Constructors
//Enums, Structs, Classes
//Properties
//Methods
//operators
//operations
[DllImport("timer")]
static extern void tic();
[DllImport("timer")]
static extern long toc();
static public long Tic()
{
try
{
tic();
return 0;
}
catch (Exception e)
{
Console.WriteLine(String.Format("PerformanceCounter.Tic() {0}", e.Message));
return -1;
}
}
static public long Toc()
{
try
{
return toc();
}
catch (Exception e)
{
Console.WriteLine(String.Format("PerformanceCounter.Toc() {0}", e.Message));
return -1;
}
}
//access
//inquiry
//Fields
}
The main console body contains that code. I included only doubles function here to save space:
class Program
{
static int size = 1000000;
static void Main(string[] args)
{
try
{
if (args.Length >= 1)
size = (int)Convert.ToUInt32(args[0]);
}
catch (Exception e)
{
Console.WriteLine(String.Format("Can not convert {0} to uint32: {1}", args[0], e.Message));
size = 1000000;
}
shorts();
ints();
longs();
floats();
doubles();
decimals();
}
//...
static void doubles()
{
double[] a = new double[size];
double[] b = new double[size];
Random rnd = new Random();
for (int i = 0; i < size; i++)
{
a[i] = rnd.NextDouble() - 0.5;
b[i] = rnd.NextDouble() - 0.5;
}
PerformanceCounter.Tic();
double c = 0.0;
for (int i = 0; i < size; i++)
c += a[i] * b[i];
Console.WriteLine(String.Format(" doubles: {0} ms", PerformanceCounter.Toc()));
a = null;
b = null;
}
//...
Below is the example of the console output for 5000000 dimensional vectors.
>inner.exe 5000000
shorts: 16 ms
ints: 7 ms
longs: 69 ms
floats: 9 ms
doubles: 9 ms
decimals: 2569 ms
I was actually stunned seeing floats
and doubles
in C# performing 1.3 to 3.3 times faster than in C/C++ even SSE optimized. It should not be so, as the code is managed and compiled during run-time and it is the same CPU/FPU? but how is it possible to run faster? If you now the answer post it here. See the Inner Product Experiment: CPU, FPU vs. SSE* article on the performance times for corresponding numeric types in C/C++. Ints
perform a little faster but it might be of no profit quantizing floats
to fixed point arithmetic and C# again outperforms C/C++ runing 2.28 times faster. However shorts
and longs
run quite slow. Shorts
in C# perform as fast as in C/C++ but SSE2 intrinsics however outperform C#. You should prevent yourself to not to use decimals
until you need high precision after comma, otherwise it will run the computation forever.
Having all that amenities in C# programming shall we not migrate DSP applications from C++?
Update (7 Apr 2008)
Sadly to C# adherents and to great delight of C++ gurus as the labours we spent in C/C++ were not yet in vain. The C# compiler indeed optimizes the code the way to avoid unused variables somehow, that indeed led me astray. To regain tarnished C++ glory here is the example of C# output for 5000000 sized vectors:
>inner.exe 5000000
shorts: 16 ms
27006
ints: 18 ms
1240761
longs: 72 ms
-5610477
floats: 30 ms
33,548
doubles: 35 ms
198,949191315363
decimals: 2936 ms
138,23876271661179995948054686
It leaves however some space for dispute as why it does not removed unused for() for shorts and longs. The doubles run slower compared to floats contrariwise for C++ where doubles outperforms floats.
Update (6 May 2008)
Unfolding for()
loops indeed provided speed up but only in case of unfolding 4 times. The same trick did not provided performance increase in C++ code. This is how I did the unfolding:
...
float c = 0.0f;
int ii = 0;
for (int i = 0; i < size / 4; i++)
{
c += a[ii] * b[ii];
ii++;
c += a[ii] * b[ii];
ii++;
c += a[ii] * b[ii];
ii++;
c += a[ii] * b[ii];
ii++;
}
...
And the results are shown below:
>inner.exe 5000000
shorts: 16 ms
-24687
shorts 4loop: 14 ms
7038
ints: 18 ms
19686
ints 4loop: 16 ms
9090795
longs: 71 ms
-870676
longs 4loop: 75 ms
-8263341
floats: 32 ms
43,41741
floats 4loop: 15 ms
11,02298
doubles: 34 ms
194,810329249757
doubles 4loop: 24 ms
-495,312642682424
doubles unsafe: 32 ms
-283,031436372233
decimals: 2550 ms
368,82465505657333076693624932
decimals 4loop: 2611 ms
-50,405825071718589646106671809