Click here to Skip to main content
Click here to Skip to main content

Operation Performance Evaluation

, 18 Jan 2012
Rate this:
Please Sign up or sign in to vote.
Investigating the cost of an operation in cycles within a real world, i.e., no peak, performance measurement of C#, C++, Java, Fortran and JavaScript
Performance overview

Introduction

Ever wondered if some programming languages stereotypes like "Fortran is the fastest language" or "C++ cannot be matched by Java or C#" are still valid? This article tries to answer a part of this question. Evaluations have been done in the following languages:

  • C#
  • C++
  • Java
  • Fortran
  • JavaScript

The last one was not meant as a serious competitor for the ones above rather than out of interest. The results of JavaScript will not be mixed up with the other four languages. Every language had several tests in order to find out other interesting details. Some of those details concerned:

  • How much slower is Debug than Release for a C# application?
  • How much acceleration do I get for (32-bit) integer operations due to 64-bit (instead of 32-bit)?
  • Does it matter if a C# application is compiled for AnyCPU or x64?
  • How much slower does a program get by compiling it to MSIL instead of direct bytecode?
  • Which browser has the fastest JavaScript engine measured in cycles per operation?

In order to do those evaluations as realistic as possible, a code has been set up that will be explained later. It should be noted that the code was rewritten as exact as possible. Nevertheless, some languages did not provide equal features as the original program code, which was written in C#. Therefore other time measurement routines have been used for JavaScript, Java and Fortran since those languages do not offer a good connection to the Windows system and therefore cannot access the Windows system team measured in Ticks, where 1 Tick occurs every 1000 cycles.

What This Article is Good For

This article aims to give some qualitative comparison of the languages performance on the core operations such as the addition (+), subtraction (-), multiplication (*) and division (/) as well as modulo (%). Also combined operations like division, addition and modulo (/+%) and division, addition, subtraction and multiplication (/+-*) were tested. This will give us some insight about the optimizations of the used compiler. Every language depends hugely on the used compiler since a language is nothing more than a set of grammar rules and meaningful words. The compiler is responsible to use the given language in order to translate the code from a high-level to a lower level (like Assembler code or MSIL).

What This Article Is Not About

This article does not aim to give insight about peak performances of each language or compiler. Optimizations were used in a restrictive manner, such that the optimizations would never break any big code. Therefore I used O2 optimizations as a maximum level for Fortran or C++. Also I did not set up a complete new computer, i.e. some programs and services were running and active during the benchmarks. However, the CPU was at a low / medium load while doing the benchmarks, which is something that will also occur in real world applications. Therefore, I consider this a practical test.

This article will also not say that any language is faster than another language. In my opinion, the speed of language is determined by a lot of factors. Even if a language executes faster operations, it is not said that memory operations are also scaled like that. Also many programs access different resources (HDD, Network, ...) which may also favor different languages than the ones that are listed here. In total, the speed of a language for a certain kind of operation does not say too much about the performance of the language - every language targets a different class of problems.

Background

In order to calculate the optimal width of an item in a collection, I divide the width of the available space through the number of items. This is an integer division, i.e., we will get back an integer. In general, the result is not exact, i.e. if I multiply the result with the number of items, then I won't obtain the width of the available space again. In order to avoid such a mistake, I basically do the following:

int widthPerItem = total_width / items.Length;
int remaining = total_width % items.Length;

From this point on, we have two possible solutions. The first solution is not very elegant:

items[0].Width = widthPerItem + remaining;
for(int i = 1; i < items.Length; i++)
{
    items[i].Width = widthPerItem;
}

Here we give the additional (remaining) pixels to the first item and leave the other items with the calculated width. The second solution is more elegant:

for(int i = 0; i < items.Length; i++)
{
    items[i].Width = widthPerItem + Math.Sign(remaining);
    if(remaining > 0)
        remaining--;
}

Here, we add one pixel to the calculated width of each item until we have no more pixels. This means some items will get the calculated width plus one pixels, while the other items get just the calculated width. While those two solutions seem to fit in various situations, my article will deal with the initial problem of first doing a division followed by a modulo operation. Those two operations are about equally expensive. Both operations are really expensive as well! I was once told that a division is about eight times as expensive as an addition. This article will try to find out:

  1. Are the rules of thumb still valid?
    • A subtraction is equal to an addition
    • A multiplication is more expensive than an addition
    • A division is about eight times as expensive as an addition
    • A modulo operation is equal to a division
  2. Is the following code a better approach for retrieving the remaining part?
    int widthPerItem = total_width / items.Length;
    int remaining = total_width - widthPerItem * items.Length;

The Basic Code

The original code was written in C# for the .NET-Framework. In order to measure the performance, each operation had to be repeated multiple times. For the evaluation, we set the program to 10,000 repetitions. Since just repeating one operation in a for-loop can sometimes be solved trivially by the compiler, it was necessary to alter the operands. This was solved by using an array of data-points, with a quite long array length. We used 100,000 data-points. This results in an array size of 100,000 times 4 bytes (an integer), i.e. 400,000 bytes or about 391 kB. This is certainly bigger than the Level 1 cache of the processor but certainly smaller than the L2 cache of the processor (nowadays).

All in all, each operation (or combined operation) is executed 109 times. Since the usual frequencies are in the GHz or 109 Hz range, we can estimate that each loop will take longer than one second since every operation will require at least one cycle. The Stopwatch instance will give us the time in ticks, i.e. in 103 cycles. This will still result in quite large numbers, so we divide the result through 1000 in order to obtain the result in 106 cycles.

//The constants being used for the evaluations
const int SIZE = 100000;
const int REPEAT = 10000;

static void Main(string[] args)
{
    Random ran = new Random(0);

    //Initialize the required arrays
    int[] A = new int[SIZE];
    int[] B = new int[SIZE];
    int[] C = new int[SIZE];

    //Fill the arrays with random numbers:
    for (int i = 0; i < SIZE; i++)
    {
        A[i] = ran.Next();
        B[i] = ran.Next();
    }

    //Always use the .NET-Stopwatch for timing
    Stopwatch sw = new Stopwatch();
    sw.Start();

    for (int j = 0; j < REPEAT; j++)
        for (int i = 0; i < SIZE; i++)
            C[i] = A[i] + B[i];

    sw.Stop();
    Console.WriteLine("+:       " + sw.ElapsedTicks / 1000);

    /* The same for -, *, /, % */

    sw.Reset();
    sw.Start();

    //First combined operation
    for (int j = 0; j < REPEAT; j++)
        for (int i = 0; i < SIZE; i++)
            C[i] = A[i] / B[i] + A[i] % B[i];

    sw.Stop();
    Console.WriteLine("/ + %:  " + sw.ElapsedTicks / 1000);

    sw.Reset();
    sw.Start();

    //Second one - to verify if this one really executes faster
    for (int j = 0; j < REPEAT; j++)
        for (int i = 0; i < SIZE; i++)
        {
            C[i] = A[i] / B[i];
            C[i] += A[i] - C[i] * B[i];
        }

    sw.Stop();
    Console.WriteLine("/ + -*: " + sw.ElapsedTicks / 1000);
}

One thing to remark here is that this is a real world evaluation. If we wanted to perform on peak performances, we would implement matrix vector products. No one could argue that in real world applications, one would first have to gather the operands from RAM or even hard-disk. However, this would make the evaluation even harder and less accurate, since RAM and hard-disk have a much higher access time than the operation would take. In such a case, it would be required to measure RAM or hard-disk access time independently and add it to the measurements of this evaluation.

The Benchmarks

Every benchmark included the 5 elementary operations and 2 combined operations. The following benchmarks were performed:

  • C# with AnyCPU in Debug
  • C# with AnyCPU in Release
  • C# with x64 in Release
  • C# with x86 in Release
  • C++.NET AnyCPU in Debug (/CLI)
  • C++.NET AnyCPU in Debug (/MSIL)
  • C++.NET x64 in Release
  • C++.NET x86 in Release
  • C++ (MinGW) for x64
  • C++ (MS) for x64
  • Fortran (MinGW) for x64
  • Java SDK7-x64 Release

All measurements have been performed in units of 106 cycles or have been performed in units of 10-3 seconds (milliseconds or short ms). Measurements using the later one have been recalculated to 106 ticks using the equation (where f is the clock-frequency of the CPU, i.e. cycles per second),

TTicks=f * TSeconds / 1000.

By using Ticks / s (which is cycles / ms) we can also use TMilliseconds in order to calculate the time in cycles. Since we want to compare 106 cycles, we have to divide the final result by 106. Those 106 cycles are then equivalent to 103 Ticks or Ticks in thousands. The final result of the first evaluation round is the following:

+ - * / % / + % / + - *
C# Any CPU, Debug 19114 18686 19599 18916 18833 29893 37223
Any CPU, Release 2546 2638 2627 12341 12260 24269 12054
x86, Release 7743 7846 7630 12355 12072 26073 18906
x64, Release 2574 2555 2592 12486 12286 24324 12084
C++ Any CPU, /CLI, Debug 11406 11795 11629 14681 14779 24347 31903
Any CPU, /MSIL, Debug 11563 11480 11514 14896 14698 24643 26446
x86, Release 3156 3337 3286 12575 12102 24939 12628
x64, Release 3850 3883 3969 13003 12846 24716 12855
x64, mingw 12522 12768 12773 15598 14572 24468 34050
x64, ms 3195 3196 3462 12828 12257 12132 12600
Fortran x64, mingw 9324 9141 9361 12980 13857 26253 23328
Java jsdk7-x64 4929 4964 2800 12466 12565 12483 13057
Best 2546 2555 2592 12341 12072 12132 12054

Using the data, I could draw the following conclusions:

  • The debug mode is really slow - this is due to the Visual Studio Hostprocess being attached to every operation.
  • C# is a language that really benefits from 64-bit - the compiler seems to be able to increase speed compared from x86 to x64.
  • There is no difference between x64 and AnyCPU (on my x64 system). Therefore compiling to AnyCPU seems to be the best option unless you want to restrict your program to a certain architecture or Framework. One example where this makes sense is if you want to use the x86 Version of the .NET-Framework - even on x64 Systems.
  • The GNU compiler did not perform well at all. This seems to confirm the common stereotype that the GNU compilers are not suitable for HPC usage.
  • Some intelligent compilers are also present - that can be seen by the fact that the naive (/ + %) operation requires the same cycles as the optimized version (/ + - *). This is only possible if the former one is translated to the later one automatically. The Java compiler and the Microsoft (C++) compiler did a very good job here.
  • In debug mode, the only number that counts is the number of operations since the attachment of the Visual Studio Hostprocess costs more than the operation itself. This means that each operation costs about the same (and more than the operation itself) and scales with the number of operations.
  • C# proves to be really solid winning nearly all.

The data resulted in the following chart:

The first evaluation containing C#, C++, Fortran and Java

It is also worth noting that I performed several JavaScript benchmarks on all major browsers:

  • JavaScript in V8 (Chrome 15)
  • JavaScript in Chakra (IE 9)
  • JavaScript in Carakan (Opera 11.52)
  • JavaScript in SpiderMonkey (Firefox 8)
  • JavaScript in Nitro (Safari 5.1.1)
+ - * / % / + % / + - *
JavaScript V8 (Chrome) 37274 21633 20543 32485 41556 58794 45791
Chakra (IE) 38424 38103 122998 143617 53781 231479 314839
Carakan (Opera) 66330 66426 68589 79509 68092 137450 165267
SpiderMonkey (Firefox) 49405 51231 51868 80634 195912 222444 105849
Nitro (Safari) 52014 51508 86761 69356 52750 99713 148745
Best 37274 21633 20543 32485 41556 58794 45791

Using the data, I could draw the following conclusions:

  • All JavaScript engines perform worse than C++.
  • Compiling to bytecode gives Google Chrome's V8 engine a huge boost compared to other engines.
  • The other engines perform as other languages perform in debug mode - this is due the interpretation process. The browser is attached to the JavaScript code and therefore performs an interpretation step before and after each operation. This means that the operation itself does not matter - only the number of operations matter.
  • A general statement might be: If you have the choice between doing something optimized - but with more operations - and naive (or less optimized) you should stick with the later one.
  • The IE engine does have some problems with multiplication and division. The problem does not occur with modulo.
  • Modulo seems to be implemented in a special way since all interpretation engines excluding Firefox show a better result here than with a division.

The data resulted in the following chart:

The second evaluation containing JavaScript on all major browsers

Additionally, I had the chance to do another interesting set of tests with high-performance compilers provided by Intel. These tests ran on another (special) machine, so it would have been unfair to compare those results with the ones from the benchmark above. These benchmarks included:

  • Fortran (Intel) for x64
  • C++ (Intel) for x86
  • C++ (Intel) for x64
  • C++ (MS) for x86
  • C++ (MS) for x64

All those additional benchmarks were executed with the optimization level set to O2. This was also very effective for Fortran, since O3 crashed always and parallel mode or SSE3 optimized modes proofed to be even slower than a plain O2 optimization.

+ - * / % / + % / + - *
Fortran x64 - Intel 1976 2025 1975 2024 2026 1973 2987
C++ x86 - Microsoft 3045 3107 2974 8450 8449 8427 8594
x64 - Microsoft 2469 2422 2441 8359 8357 8395 8390
x86 - Intel 1852 1739 1876 9529 10156 18861 11644
x64 - Intel 1724 1723 1586 8196 8597 17427 10235
Best 1724 1723 1586 2024 2026 1973 2987

Using the data, I could draw the following conclusions:

  • Fortran does obviously do a lot of cool optimizations.
  • While Fortran seems to be slower on simple operations like addition, subtraction and multiplication it certainly is superior in more advanced or combined operations.
  • Intel seems to know (their) hardware better (Ah, really?!) while Microsoft seems to know (their) software, i.e. a compiler or programming language, better. This can be seen by looking at the optimizations done by the Microsoft compiler in order to reach the same level with the naive operation (/ + %) as with the optimized operation (/ + - *).

The data resulted in the following chart:

The third evaluation containing a short evaluation of Intel and Microsoft compilers

Evaluations

The computer that was used for the primary benchmarks has the following specifications:

  • Intel Core2Duo CPU
  • Clock frequency 2343808 kHz
  • 6 GB of RAM
  • Windows 7 x64

The other computer which had the Intel compiler installed has the following specifications:

  • Intel Core2Duo CPU
  • Clock frequency 3247109 kHz
  • 8 GB of RAM
  • Windows 7 x64

I used the following environments in order to evaluate the different code files:

  • All C++.NET, C++ with the Microsoft compiler, C++ / Fortran with the Intel compiler and C# evaluations have been done using Microsoft Visual Studio 2010 with Service Pack 1.
  • C++ / Fortran with the MinGW compiler have been compiled using the command line.
  • Java has been compiled and evaluated using IntelliJ IDEA.

For evaluation purposes, I rewrote the original C# code to other platforms. The major differences for the different languages and platforms (.NET, Windows, Browser, ...) are outlined in the code snippets below:

The C++.NET version was certainly quite close to the C# version in terms of the Random class as a random number generator. Also the Stopwatch class could be used.

#include "stdafx.h"

#define SIZE 100000
#define REPEAT 10000

using namespace System;
using namespace System::Diagnostics;

int main(array<System::String ^> ^args)
{
    int* A = new int[SIZE];
    int* B = new int[SIZE];
    int* C = new int[SIZE];

    //Initializes the random number generator with a seed
    Random^ ran = gcnew Random(0);

    //Filling the vectors with random numbers
    for (int i = 0; i < SIZE; i++)
    {
        A[i] = ran->Next();
        B[i] = ran->Next();
    }

    //Create new instance of custom Stopwatch class (managed instance and pointer!)
    Stopwatch^ sw = gcnew Stopwatch();

    sw->Start();
    for (int j = 0; j < REPEAT; j++)
        for (int i = 0; i < SIZE; i++)
            C[i] = A[i] + B[i];

    sw->Stop();
    Console::WriteLine(L"+:       " + sw->ElapsedTicks / 1000);
    sw->Reset();

    /* More evaluations */
}

The transition from C++.NET to C++ (Windows API) was not very complicated. However, it was necessary to a custom class to mimic the behaviour of the .NET-Stopwatch class. I wrote my own Stopwatch containing all the methods of the original version and using the Windows API in order to retrieve the current tick count.

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <iostream>

#define SIZE 100000
#define REPEAT 10000

using namespace std;

class Stopwatch
{
    LARGE_INTEGER start;
    LARGE_INTEGER end;

    public:
        Stopwatch()
        {
            LARGE_INTEGER frequency;
            QueryPerformanceFrequency(&frequency);
            cout << "Frequency: " << frequency.QuadPart << 
                        " Ticks/s" << endl;
            Reset();
        }

        long ElapsedTicks;

        void Reset()
        {
            ElapsedTicks = 0;
        }

        void Stop()
        {
            QueryPerformanceCounter(&end);
            ElapsedTicks = end.QuadPart - start.QuadPart;
        }

        void Start()
        {
            QueryPerformanceCounter(&start);
        }
};

int main()
{
    int* A = new int[SIZE];
    int* B = new int[SIZE];
    int* C = new int[SIZE];

    //Initializes the random number generator with a seed
    srand(0);

    //Filling the vectors with random numbers
    for (int i = 0; i < SIZE; i++)
    {
        A[i] = rand() + 1;
        B[i] = rand() + 1;
    }

    //Create new instance of custom Stopwatch class
    Stopwatch* sw = new Stopwatch();

    sw->Start();
    for (int j = 0; j < REPEAT; j++)
        for (int i = 0; i < SIZE; i++)
            C[i] = A[i] + B[i];

    sw->Stop();
    cout << "+:       " << sw->ElapsedTicks / 1000 << endl;
    sw->Reset();

    /* More evaluations */
}

Java (to my surprise) did also not include a Stopwatch class to be used. This might make sense since Java does also not know anything about Ticks, since Java aims to be platform independent and Ticks is a concept that was mostly used in Windows. So I used an already build Stopwatch instance and extended it with an automatic conversation into Ticks (this conversation is specific to my computer - so if you want to use it, you should alter the line in the code with your frequency in Ticks / s or with the general clock frequency / 1000).

public class Stopwatch {
    private long startTime = 0;
    private long stopTime = 0;
    private boolean running = false;

    public void start() {
        this.startTime = System.currentTimeMillis();
        this.running = true;
    }

    public void stop() {
        this.stopTime = System.currentTimeMillis();
        this.running = false;
    }

    public void reset() {
        startTime = 0;
        stopTime = 0;
    }

    //elaspsed time in milliseconds
    public long getElapsedTime() {
        long elapsed;
        if (running) {
             elapsed = (System.currentTimeMillis() - startTime);
        }
        else {
            elapsed = (stopTime - startTime);
        }
        return elapsed;
    }

    public long getElapsedTicks() {
        //This is specific to my computer which has 2343808 Ticks / s
        double frequency = 2343808;
        return (long)((((double)getElapsedTime()) / 1000.0) * frequency);
    }

    //elaspsed time in seconds
    public long getElapsedTimeSecs() {
        long elapsed;
        if (running) {
            elapsed = ((System.currentTimeMillis() - startTime) / 1000);
        }
        else {
            elapsed = ((stopTime - startTime) / 1000);
        }
        return elapsed;
    }
}

Using the custom Stopwatch class, I could write the following program:

import java.util.Random;

public class Main
{
    final static int SIZE = 100000;
    final static int REPEAT = 10000;

    public static void main(String[] args)
    {
        //Initializes the random number generator with a seed
        Random ran = new Random(0);

        int[] A = new int[SIZE];
        int[] B = new int[SIZE];
        int[] C = new int[SIZE];

        //Filling the vectors with random numbers
        for (int i = 0; i < SIZE; i++)
        {
            A[i] = ran.nextInt();
            B[i] = ran.nextInt();
        }

        //Create new instance of custom Stopwatch class
        Stopwatch sw = new Stopwatch();

        sw.start();
        for (int j = 0; j < REPEAT; j++)
            for (int i = 0; i < SIZE; i++)
                C[i] = A[i] + B[i];

        sw.stop();
        System.out.println("+:       " + sw.getElapsedTicks() / 1000);
        sw.reset();

        /* More evaluations */
    }
}

The biggest change was certainly done by using Fortran. Fortran does not start the index of an array at 0 but at 1. Also Fortran uses regular (round) brackets instead of square brackets to talk to an index of an array. Also in order to be backwards compatible, it is required to write Fortran is the original way, i.e., using a very old style of programming with punched cards. This means that the program has to fulfill certain requirements such that the first six columns are specially handled by the compiler. Also the maximum length of a row cannot exceed 80 columns, with the last 8 columns being reserved for comments.

      PROGRAM MEASUREMENT
      INTEGER s
      INTEGER A (100000), B (100000), C (100000)
      INTEGER r
      INTEGER i, j
      s = 100000
      r = 10000

C     Those two are being used for the relative time measurements
      REAL TIMEARRAY (2)
      REAL DELAPSE

C     Initializes the random number generator with a seed
      CALL srand(0)

C     Filling the vectors with random numbers
      do i = 1, s
        A(i) = 10 * rand(0) + 1
        B(i) = 10 * rand(0) + 1
      enddo

      DELAPSE = DTIME(TIMEARRAY)
      do j = 1, r
        do i = 1, s
          C(i) = A(i) + B(i)
        enddo
      enddo

      DELAPSE = DTIME(TIMEARRAY)
      WRITE (*,*) '+:    ', DELAPSE

C     More evaluations

      END PROGRAM

The evaluation in JavaScript was quite fast to write. Also like Java JavaScript does not provide a certain routine to test the time. Testing the time can result in quite inexact results - depending on the browser. However, considering the length of the test it is obvious that deltas in time measurements as the 15 ms of (older?) Internet Explorer will not matter too much in this case, since the time will certainly be in the seconds (or even minutes) area.

var SIZE = 100000;
var REPEAT = 10000;

var A = new Array(SIZE);
var B = new Array(SIZE);
var C = new Array(SIZE);

// Filling the vectors with random numbers
for (var i = 0; i < SIZE; i++)
{
    A[i] = parseInt(Math.floor(Math.random() * 100001)) + 1;
    B[i] = parseInt(Math.floor(Math.random() * 100001)) + 1;
}

// Runs a benchmark for a certain function f
function benchmark(f, id) {
    // Starttime
    var start = (new Date).getTime();
    f();
    // Difference in time
    var diff = (new Date).getTime() - start;
    // Write result to HTML-Document using the provided id
    document.getElementById(id).innerHTML = diff;
}

function plus() {
    for (var j = 0; j < REPEAT; j++)
        for (var i = 0; i < SIZE; i++)
            C[i] = A[i] + B[i];
}

/* More evaluations */

It should be said that the JavaScript has to be converted to Ticks / s manually afterwards. If you run this evaluation, then please build use something like the following or convert it afterwards:

/* Code from above ... */

/* Stays pretty much the same */
function benchmark(f, id) {
    var start = (new Date).getTime();
    f();
    var diff = (new Date).getTime() - start;
    // Now this is different! -- this one could be compared 
    // with the output of the others
    document.getElementById(id).innerHTML = parseInt(time2Ticks(diff) / 1000);
}

/* New Function! */
function time2Ticks(msec) {
    var frequency = 2343808;
    return msec / 1000 * frequency;
}

Using the Code

The code can be used without any limitations. If you have suggestions for improvements or ideas for further evaluations, you are welcome to comment here. It should be noted that I thought about doing the same evaluations on Linux, but skipped that because I thought it would be unfair to test a fresh system against a system that is running since some years and is loaded with some services and many programs in the background. Also the Microsoft compiler is not available on Linux leading to a different evaluation in this scope. For C#, I would have used the Mono implementation. However, you are welcome to make a similar evaluation on a Linux machine and inform me about the results. As long as the results are in Cycles / operation or Ticks / Operation, it is kind of comparable (again: the thing which makes it unfair to compare is the background load of the system, i.e. the programs and services that require some cycles to be executed while the (benchmark) program is running).

In order to get Fortran running, we used the following compilation command (this is obviously executing O2 optimizations against our code):

$ ifort base.f /check:bounds

All the required codefiles are included in the source.zip file.

Bonus: Scripting languages

Another interesting evaluation can be done in the scripting languages genre. Here I included the following languages:

  • Perl (ActiveStatePerl for Windows 5.14.2 x64)
  • PHP (Version 5.3.1 for Windows, which was included in the XAMPP)
  • Python (Version 2.5.4 for Windows)

First: Please note that the version of Python is obviously outdated. I do not know if newer version provide a better performance. I did not hear about better performance, but if I'll ever have the time to change the Python installation on my computer I will do another evaluation for sure. The next thing to notice is that I did run all those benchmarks on Windows. PHP is getting faster and faster on Windows, however, I do think that it is still faster on Linux than on Windows. This is obviously also true for the other two scripting languages, since they have their origins in the Linux operating system as well.

Evaluation of the three major scripting languages

PHP seems to be faster than Perl due to various factors. On the one hand it is a specialized version of Perl, on the other side it has been regularly maintained in the last decade. It is also worth noting that there is a certain Windows factor in this evaluation. I would be curious to see this evaluation on Linux (and I will either do that myself sometimes in the future or wait for somebody to post reliable results). Either way, it is quite obvious that the scripting languages are really slow. I actually thought that they might be even slower than JavaScript, but a factor of 20 is quite a surprise. It is also worth noting that all those languages are following the pattern as seen from other scripting languages, containing the costly interpretation steps between the operations.

All in all I do think that this kind of evaluation is kind of unfair. The scripting languages were never supposed to care about operations - they should provide a reliable and easy to use source for quick and dirty programming. The evaluation is unfair since the for-loop is in my opinion too heavy and does require too many operations to be performed. The languages presented before (which have been the original target of the article) do contain special treatment for such loops. Therefore the time spent performing the loop itself is neglectable compared to the operation. This is what we were up to. Now this is not the case. I believe that all languages are direct proportional to the number of operations performed - including conditions, loop counters and others.

This is true if have a close look at the following numbers:

+ - * / % / + % / + - *
Script Languages Python 913839 945178 976808 942330 989024 1572805 2200857
Perl 787519 796895 794551 803926 820333 1335971 1999268
PHP 637516 637516 639860 665641 672673 1024244 1258625
Best 637516 637516 639860 665641 672673 1024244 1258625

The best values are provided by PHP which is linear dependent on the number of operations. We have three to six dominant operations (the inner loop counter increment, the inner loop condition and the number of operations which is ranging from atomic like + to complex / + - *). Therefore every operation is about 200 cycles of computation time, which is about 70 times as expensive as C#. Perl and Python might do some other operations internally, which is why the linear behaviour cannot be applied here. However, we do see that the number of operations scales with the computation time, which is still characteristic for such languages. Here writing something in a simple way performs better than trying to squeeze some performance out of the computer by using more operations in a more complex algorithm.

Update: JavaScript Performance

I was a bit disappointed after my first JavaScript performance evaluation. The browser vendors were marketing the new JavaScript possibilities of their products quite enthusiastically. However, the real performance was really slow compared to C++ (besides Google's V8 engine). Now I changed my opinion after doing these benchmarks on the most popular scripting languages. Since JavaScript is still (again: besides Google's V8 engine) a scripting language, it is remarkable how much performance the browser vendors could squeeze out. And according to reports they are still working on improvements for providing even faster JavaScript.

So I took the opportunity to have a close look at the recent versions of the five most popular browsers (compare with the listings above):

  • JavaScript in V8 (Chrome 16)
  • JavaScript in Chakra (IE 10 PP2 since PP4 is only available for Windows 8 DP)
  • JavaScript in Carakan (Opera 11.6)
  • JavaScript in SpiderMonkey (Firefox 9)
  • JavaScript in Nitro (Safari 5.1.2)

Here bold numbers indicate a change. Since another table of data is too much I will just highlight the changes:

  • V8 is still being enhanced. All evaluations remained constant with exception of the last (combined) benchmark. Google Chrome 16 saves about 25% of computation time there. I suppose the compiler now recognizes some patterns and performs some optimizations on these.
  • IE did not change at all from the release version 9 to the platform preview of version 10. I think most of the performance upgrades have not been done in JavaScript but in the DOM manipulation and DOM rendering process.
  • Opera did not change at all.
  • Firefox made a huge leap. More on that below!
  • Safari remains the same - maybe a little bit slower, but the difference is not significant.

Firefox 9 did not contain any spectacular features - one might think. However, under the surface the guys at Mozilla seem to have a done a pretty good job on improving the JavaScript engine. Firefox 9 now executes JavaScript operations about twice as fast as Firefox 8. The first benchmark (addition) is even the faster than Google V8 and the only winner in this JavaScript benchmark that is not called Google V8. The data from Firefox (the complete data can be viewed in the included sheet):

+ - * / % / + % / + - *
Mozilla Firefox 9 23956 24385 29424 55152 39906 76924 83547
8 49405 51231 51868 80634 195912 222444 105849
Ratio 9 to 8 48% 48% 57% 68% 20% 35% 79%

The Mozilla team did a really good job in improving their JavaScript engine. In the following chart we can also see that Google is still working on improvements for their JavaScript engine V8. It will be interesting into what kind of regions JavaScript will be pushed the next years. The next chart contains the ratio of the current performance vs the former performance, i.e. a value lower than 100% means an improvement. Values around 100% are usually equivalent in performance, since a small number of the evaluation is being determined by the current state of the computer and other factors.

Improvements on the JavaScript engines of the five most popular browsers

Conclusion

In general, it is safe to say that a subtraction is equal to an addition, which is equal to a multiplication. A division is about five times as expensive as an addition and is equal to a modulo operation. Fortran executes division and modulo operations more efficiently, leading to a faster execution.

In general, the optimized version of the operation (% + - *) should be preferred over the naive one (% + / ). This statement breaks for interpreted scripting languages where the operations cost much more due to interpretation steps. Therefore in JavaScript, the best performance can be reached by the straightest solutions - skipping operations which would lead to optimizations in other languages.

Points of Interest

Writing the test program in all those different languages was quite funny. I did not write Fortran in a long time and was astonished that the language itself still feels really old and outdated. However, the performance is still unmatched in some key fields like divisions. I cannot imagine that this is something which could be only implemented in compilers for Fortran, so I wonder why other compilers do still not have similar features yet.

I was also amazed how strong Google's V8 JavaScript engine performed. When Chroma did come out, I used it for a while until I found myself being in love with Opera's independent browser. Now this performance test showed that the developers did not lie about the core feature of V8 - direct compilation into bytecode for the provided JavaScript code. Of course, this is something very cool for webpages, however, it should be noted that arithmetic operation time is usually not the key aspect of JavaScript code since JavaScript is not being used in HPC. Most operations in JavaScript code are DOM manipulations or other interactions with the browser. Therefore this benchmark does also not state that Google Chrome is the best browser or fastest browser for JavaScript. Nevertheless, I assume that Chrome won't perform too bad in DOM manipulations either.

References

I found the following links useful:

  • Article on StackOverflow about using Windows API to retrieve the tickcount.[^]
  • Corey Goldberg's Stopwatch class for Java.[^]
  • Doing relative timing in Fortran.[^]
  • How to measure the absolute time in Fortran.[^]
  • John Resig about the accuracy of JavaScript timing.[^]

History

  • v1.0.0 | Initial release | 23.12.2011
  • v1.1.0 | Update (Included major scripting languages and a JavaScript update) | 17.01.2012

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Florian Rappl
Chief Technology Officer
Germany Germany
Florian is from Regensburg, Germany. He started his programming career with Perl. After programming C/C++ for some years he discovered his favorite programming language C#. He did work at Siemens as a programmer until he decided to study Physics. During his studies he worked as an IT consultant for various companies.
 
Florian is also giving lectures in C#, HTML5 with CSS3 and JavaScript, and other topics. Having graduated from University with a Master's degree in theoretical physics he is currently busy doing his PhD in the field of High Performance Computing.
Follow on   Google+

Comments and Discussions

 
GeneralMy vote of 5 Pinmembercsharpbd14-May-13 22:43 
GeneralRe: My vote of 5 PinmvpFlorian Rappl15-May-13 0:00 
SuggestionFortran timing method [modified] Pinmemberanthony ballow23-Jan-13 4:38 
GeneralRe: Fortran timing method PinmvpFlorian Rappl14-May-13 23:59 
GeneralMy vote of 5 PinmemberPhat (Phillip) H. VU27-Dec-12 19:22 
GeneralRe: My vote of 5 PinmemberFlorian Rappl27-Dec-12 23:44 
GeneralMy vote of 5 Pinmembermanoj kumar choubey27-Feb-12 3:34 
GeneralMy vote of 5 Pinmembermajidkamali13702-Feb-12 11:56 
GeneralMy vote of 5 PinmemberJose David Pujo31-Jan-12 6:16 
GeneralTx Pinmemberraananv27-Jan-12 10:03 
GeneralMy vote of 5 Pinmvpthatraja22-Jan-12 21:53 
GeneralMy vote of 5 PinmemberAkram El Assas22-Jan-12 11:54 
SuggestionBenchmark suggestion PinmemberDávid Kocsis19-Jan-12 2:05 
GeneralRe: Benchmark suggestion PinmemberFlorian Rappl19-Jan-12 11:01 
GeneralMy vote of 5 PinmvpKanasz Robert18-Jan-12 20:58 
GeneralMy vote of 4 Pinmemberjfriedman18-Jan-12 13:43 
GeneralMy vote of 5 PinmemberSteppenwolfe16-Jan-12 16:43 
GeneralRe: My vote of 5 PinmemberFlorian Rappl16-Jan-12 20:52 
QuestionPython PinmemberAddy Tas14-Jan-12 8:35 
AnswerRe: Python PinmemberFlorian Rappl14-Jan-12 12:19 
GeneralRe: Python PinmemberAddy Tas14-Jan-12 21:40 
GeneralRe: Python PinmemberFlorian Rappl18-Jan-12 11:55 
GeneralRe: Python PinmemberAddy Tas27-Jan-12 7:03 
GeneralRe: Python PinmemberFlorian Rappl27-Jan-12 7:44 
GeneralMy vote of 5 PinmemberMihai MOGA13-Jan-12 7:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140721.1 | Last Updated 18 Jan 2012
Article Copyright 2011 by Florian Rappl
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid