Inner Product Experiment: C# vs. C/C++

Chesnokov Yuriy

3.66/5 (12 votes)

Apr 1, 2008

GPL3

2 min read

124037

367

The article demonstrating speed of inner product operation performed with shorts, ints, longs, floats, doubles and decimals in C# compared to C/C++

Introduction

The inner product (or dot product, scalar product) operation is the major one in digital signal processing field. It is used everywhere, Fourier (FFT, DCT), wavelet-analysis, filtering operations and so on. After written the similar article for the inner product in C/C++ Inner Product Experiment: CPU, FPU vs. SSE* I was thinking how the same code written in C# will perform. I repeated the inner product operations using C# types: shorts, ints, longs, floats, doubles and decimals.

Background

Inner Product Experiment: CPU, FPU vs. SSE*

Using the code

Just run the inner.exe providing as an argument the size of vector you want to convolve with. Make sure you placed timer.dll in the same directory with the executable. It provides tic() and toc() functions implementing precision time counter in milliseconds. I use the dll in PerformanceCounter static class in functions PerformanceCounter.Tic() and PerformanceCounter.Toc().

static public class PerformanceCounter
{        
//Constructors

//Enums, Structs, Classes

//Properties

//Methods
//operators
//operations
        [DllImport("timer")]
        static extern void tic();
        [DllImport("timer")]
        static extern long toc(); 

        static public long Tic()
        {
                try
                {
                        tic();
                        return 0;
                }
                catch (Exception e)
                {
                        Console.WriteLine(String.Format("PerformanceCounter.Tic() {0}", e.Message));
                        return -1;
                }
        }

        static public long Toc()
        {
                try
                {                        
                        return toc();
                }
                catch (Exception e)
                {
                        Console.WriteLine(String.Format("PerformanceCounter.Toc() {0}", e.Message));
                        return -1;
                }
        }

//access
//inquiry

//Fields       
}

The main console body contains that code. I included only doubles function here to save space:

class Program
{
        static int size = 1000000;

        static void Main(string[] args)
        {
                try
                {
                        if (args.Length >= 1)
                                size = (int)Convert.ToUInt32(args[0]);
                }
                catch (Exception e)
                {
                        Console.WriteLine(String.Format("Can not convert {0} to uint32: {1}", args[0], e.Message));
                        size = 1000000;
                }

                shorts();
                ints();
                longs();
                floats();
                doubles();
                decimals();
        }

        //...
        
        static void doubles()
        {
                double[] a = new double[size];
                double[] b = new double[size];

                Random rnd = new Random();
                for (int i = 0; i < size; i++)
                {
                        a[i] = rnd.NextDouble() - 0.5;
                        b[i] = rnd.NextDouble() - 0.5;
                }

                PerformanceCounter.Tic();

                double c = 0.0;
                for (int i = 0; i < size; i++)
                        c += a[i] * b[i];

                Console.WriteLine(String.Format(" doubles: {0} ms", PerformanceCounter.Toc()));

                a = null;
                b = null;
        }

        //...

Below is the example of the console output for 5000000 dimensional vectors.

>inner.exe 5000000
 shorts: 16 ms
 ints: 7 ms
 longs: 69 ms
 floats: 9 ms
 doubles: 9 ms
 decimals: 2569 ms

I was actually stunned seeing floats and doubles in C# performing 1.3 to 3.3 times faster than in C/C++ even SSE optimized. It should not be so, as the code is managed and compiled during run-time and it is the same CPU/FPU? but how is it possible to run faster? If you now the answer post it here. See the Inner Product Experiment: CPU, FPU vs. SSE* article on the performance times for corresponding numeric types in C/C++. Ints perform a little faster but it might be of no profit quantizing floats to fixed point arithmetic and C# again outperforms C/C++ runing 2.28 times faster. However shorts and longs run quite slow. Shorts in C# perform as fast as in C/C++ but SSE2 intrinsics however outperform C#. You should prevent yourself to not to use decimals until you need high precision after comma, otherwise it will run the computation forever.

Having all that amenities in C# programming shall we not migrate DSP applications from C++?

Update (7 Apr 2008)

Sadly to C# adherents and to great delight of C++ gurus as the labours we spent in C/C++ were not yet in vain. The C# compiler indeed optimizes the code the way to avoid unused variables somehow, that indeed led me astray. To regain tarnished C++ glory here is the example of C# output for 5000000 sized vectors:

>inner.exe 5000000
 shorts: 16 ms 
  27006 
 ints: 18 ms 
  1240761 
 longs: 72 ms 
  -5610477 
 floats: 30 ms 
  33,548 
 doubles: 35 ms 
  198,949191315363 
 decimals: 2936 ms 
  138,23876271661179995948054686

It leaves however some space for dispute as why it does not removed unused for() for shorts and longs. The doubles run slower compared to floats contrariwise for C++ where doubles outperforms floats.

Update (6 May 2008)

Unfolding for() loops indeed provided speed up but only in case of unfolding 4 times. The same trick did not provided performance increase in C++ code. This is how I did the unfolding:

...
float c = 0.0f;
int ii = 0;
for (int i = 0; i < size / 4; i++)
{
        c += a[ii] * b[ii];
        ii++;
        c += a[ii] * b[ii];
        ii++;
        c += a[ii] * b[ii];
        ii++;
        c += a[ii] * b[ii];
        ii++;
}
...

And the results are shown below:

>inner.exe 5000000
 shorts: 16 ms
  -24687
 shorts 4loop: 14 ms
  7038
 ints: 18 ms
  19686
 ints 4loop: 16 ms
  9090795
 longs: 71 ms
  -870676
 longs 4loop: 75 ms
  -8263341
 floats: 32 ms
  43,41741
 floats 4loop: 15 ms
  11,02298
 doubles: 34 ms
  194,810329249757
 doubles 4loop: 24 ms
  -495,312642682424
 doubles unsafe: 32 ms
  -283,031436372233
 decimals: 2550 ms
  368,82465505657333076693624932
 decimals 4loop: 2611 ms
  -50,405825071718589646106671809