Click here to Skip to main content
Email Password   helpLost your password?

test app

Introduction

This article is an attempt to compare the general performance of STL/CLR's sequence containers with the .NET generic List<T> collection class. Before I began work on the article, I strongly believed that the STL/CLR containers would be yards faster. To my utmost surprise, I found that this was not so and that List<T> surpassed the STL/CLR collections with ease.

How I compared performance

I wanted to keep things simple and used the common technique of repeating a specific operation several times. To smoothen the design, I have an interface as follows :-

namespace STLCLRTests 
{
    public interface class IMeasurable 
    {
        Int64 RunCode(int iterations);
    };
}

RunCode would run a specific piece of code as many times as specified by iterations, and would return the time taken in milliseconds. And I have the following abstract class that implements this interface.

namespace STLCLRTests 
{
    public ref class MeasurableDoubleOp abstract : IMeasurable
    {
    private:
        static Stopwatch^ stopWatch = gcnew Stopwatch();

    public:
        virtual Int64 RunCode(int iterations)
        {
            Initialize();

            stopWatch->Reset();
            stopWatch->Start();

            RunCodeFirstOp(iterations);
            RunCodeSecondOp(iterations);

            stopWatch->Stop();

            return stopWatch->ElapsedMilliseconds;
        }

    protected:
        virtual void Initialize() {}
        virtual void RunCodeFirstOp(int iterations) abstract;
        virtual void RunCodeSecondOp(int iterations) abstract;
    };
}

To profile a certain collection class, I just derive from this abstract class and implement RunCodeFirstOp and RunCodeSecondOp. I also have a MeasurableSingleOp class for doing tests that do not involve a two-step operation.

STL vector vs List<T> - basic insertion/removal

Here are the implementations of the vector specific and List<T> specific classes.

namespace STLCLRTests 
{
    public ref class VectorInsertRemove : MeasurableDoubleOp
    {
    private:
        cliext::vector<int> vector;

    protected:
        IEnumerable<int>^ GetVector()
        {
            return %vector;
        }

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                vector.push_back(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            for(int count=0; count<iterations; count++)
            {
                vector.pop_back();
            }
        }
    };
}
namespace STLCLRTests 
{
    public ref class GenericListInsertRemove : MeasurableDoubleOp
    {
    private:
        List<int> list;

    protected:
        IEnumerable<int>^ GetList()
        {
            return %list;
        }

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                list.Add(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            for(int count=0; count<iterations; count++)
            {
                list.RemoveAt(list.Count - 1);
            }
        }
    };
}

And here are my test results. As you can see, the BCL class (List<T>) completely outperformed the STL/CLR vector class.

Iterations STL/CLR BCL
100000 15 3
500000 63 32
1000000 122 21
10000000 1311 299

Here's a graphical plot of how the two containers performed. Clearly, the BCL class's performance was quite superior to the STL vector's.

STL vector vs List - basic insertion/removal

As you can imagine I was quite surprised by this result. Just for the heck of it I thought I should also compare the standard STL vector with the STL/CLR vector implementation. Note than I am still using managed code (/clr) - the standard STL code is also compiled as /clr. Here are my surprising results.

Iterations STL/CLR Standard STL
100000 11 39
500000 58 202
1000000 117 391
10000000 1161 3919

STL/CLR vector vs standard vector - basic insertion/removal

Based on that result, you should absolutely avoid compiling native STL code using /clr. Merely porting to STL/CLR would help performance in a big way. You might find that all you need is a namespace change (cliext to std) and you may not have to change too much code elsewhere. And no, I did not conclude this merely on my test results with vector, I compared the standard list and the STL/CLR list containers with the following results.

Iterations STL/CLR Std list
100000 33 101
500000 63 175
1000000 274 349
10000000 2969 3663

STL list vs List - basic insertion/removal

As you can see, the difference in performance is non-trivial. Please note that we are not comparing the native performance of STL here. We are comparing how the native implementation when compiled under /clr compares with the CLR implementation of STL.

STL list vs List<T> - basic insertion/removal

Here's my implementation for the STL list specific class.

namespace STLCLRTests 
{
    public ref class StlListInsertRemove : MeasurableDoubleOp
    {
    private:
        cliext::list<int> list;

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                list.push_back(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            for(int count=0; count<iterations; count++)
            {
                list.pop_back();
            }
        }
    };
}

And here are my test results. Here, the contrast is even more - not surprising really, as the STL list will always be slower than the STL vector for straight inserts and removals.

Iterations STL/CLR BCL
100000 32 2
500000 149 11
1000000 332 23
10000000 3719 331

And here's a graphical plot of the results.

STL list vs List - basic insertion/removal

STL deque vs List<T> - basic insertion/removal

Here's the deque implementation.

namespace STLCLRTests 
{
    public ref class DequeInsertRemove : MeasurableDoubleOp
    {
    private:
        cliext::deque<int> deque;

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                deque.push_back(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            for(int count=0; count<iterations; count++)
            {
                deque.pop_back();
            }
        }
    };
}

Here are my results. Nothing's changed in the pattern - the BCL class is way faster here too.

Iterations STL/CLR BCL
100000 33 2
500000 66 13
1000000 83 26
10000000 1061 251

And here's the graph.

STL deque vs List - basic insertion/removal

The BCL equivalent of a queue is the Queue<T> class - so just to be sure we are comparing apples with apples, I went ahead and ran tests comparing the STL/CLR deque with the BCL Queue<T>. My results and the corresponding graph follow.

Iterations STL/CLR BCL
100000 12 6
500000 49 15
1000000 89 28
10000000 1044 335

STL deque vs Queue - basic insertion/removal

The Queue<T> class seems to be marginally slower than List<T> but is still way faster than the STL/CLR deque container.

STL vector vs List<T> - basic iteration

This time, I wanted to test the speed with which we can iterate over a linear collection. Here are the vector and List<T> specific iteration test implementations.

namespace STLCLRTests 
{
    public ref class VectorIterate : MeasurableDoubleOp
    {
    private:
        cliext::vector<int> vector;

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            vector.clear();

            for(int count=0; count<iterations; count++)
            {
                vector.push_back(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            for(cliext::vector<int>::iterator it = vector.begin(); it != vector.end(); it++)
            {
            }
        }
    };
}
namespace STLCLRTests 
{
    public ref class GenericListIterate : MeasurableDoubleOp
    {
    private:
        List<int> list;

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            list.Clear();

            for(int count=0; count<iterations; count++)
            {
                list.Add(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            for each(int x in list)
            {
            }
        }
    };
}

Here are my test results. The results further proved the superior efficiency of the List<T> class.

Iterations STL/CLR BCL
100000 24 2
500000 93 16
1000000 194 31
10000000 2009 394

And here's the corresponding graph.

STL vector vs List - basic iteration

STL vector vs List<T> - Linq access (where)

For the Linq tests, I used a C# project (for easier syntax). I derived from the insert tester and merely overrode the RunCodeSecondOp method as I wanted to keep the insertion code intact.

namespace LinqTests
{
    public class VectorLinqWhere : VectorInsertRemove
    {
        public override void RunCodeSecondOp(int iterations)
        {
            IEnumerable<int> _vector = GetVector();
            var newVector = _vector.Where(x => x % 2 == 0);
        }
    }
}
namespace LinqTests
{
    public class GenericListLinqWhere : GenericListInsertRemove
    {
        public override void RunCodeSecondOp(int iterations)
        {
            IEnumerable<int> _list = GetList();
            var newList = _list.Where(x => x % 2 == 0);
        }
    }
}

Here are the results of my test runs. The results here are partially contaminated by the fact that the insertion code speed differences would also come into play. But the difference in performance is large enough to safely ignore that for now, and again LINQ works much faster on the BCL class as compared to the STL/CLR version.

Iterations STL/CLR BCL
100000 18 1
500000 44 7
1000000 79 11
10000000 842 168

And here's the graph.

Linq where test

STL vector vs List<T> - Linq access (take)

This is similar to the previous one except I use Take instead of Where.

namespace LinqTests
{
    public class VectorLinqTake : VectorInsertRemove
    {
        public override void RunCodeSecondOp(int iterations)
        {
            IEnumerable<int> _vector = GetVector();
            var newVector = _vector.Take(_vector.Count() / 2);
        }
    }
}
namespace LinqTests
{
    public class GenericListLinqTake : GenericListInsertRemove
    {
        public override void RunCodeSecondOp(int iterations)
        {
            IEnumerable<int> _list = GetList();
            var newList = _list.Take(_list.Count() / 2);
        }
    }
}

Here's the result of my tests. These results are very similar to the previous test.

Iterations STL/CLR BCL
100000 7 0
500000 35 4
1000000 70 10
10000000 865 205

And the corresponding graph.

Linq take test

Sorting

I ran tests comparing sorting speeds of the List<T> class with the STL/CLR vector and list containers. The code used follows.

namespace STLCLRTests 
{
    public ref class GenericListSort : MeasurableDoubleOp
    {
    private:
        List<int> list;

    protected:
        IEnumerable<int>^ GetList()
        {
            return %list;
        }

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                list.Add(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            list.Sort();
        }
    };
}

namespace STLCLRTests 
{
    public ref class StlListSort : MeasurableDoubleOp
    {
    private:
        cliext::list<int> list;

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                list.push_back(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            list.sort();
        }
    };
}
namespace STLCLRTests 
{
    public ref class VectorSort : MeasurableDoubleOp
    {
    private:
        cliext::vector<int> vector;

    protected:
        IEnumerable<int>^ GetVector()
        {
            return %vector;
        }

    public:
        virtual void RunCodeFirstOp(int iterations) override
        {
            for(int count=0; count<iterations; count++)
            {
                vector.push_back(10);
            }
        }

        virtual void RunCodeSecondOp(int iterations) override 
        {
            sort(vector.begin(), vector.end());
        }
    };
}

Here are the results for vector versus List<T>.

Iterations STL/CLR BCL
100000 37 7
500000 136 53
1000000 325 137
10000000 2695 1088
vector sort vs List sort

And here are my results for stl list versus List<T>.

Iterations STL/CLR BCL
100000 138 7
500000 1162 51
1000000 5355 128
10000000 31985 1095
STL list sort vs VCL List sort

Conclusion

One of the features that was strongly marketed before STL/CLR was released was its performance benefits over regular .NET collections. But the .NET generic List<T> seems to be much faster. At this stage all I can think of as a valid case for using STL/CLR would be when doing a first-level port of existing C++ code ( that heavily uses STL) to managed code.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralI don't buy this
are_all_nicks_taken_or_what
1:51 19 Jan '10  
Internally, the CLR containers definitely have preallocated memory. In my opinion, the comparison is pointless unless you preallocate memory for the vector too:

vector v;
v.reserve(N);

Also, make sure to define _SECURE_SCL=0.
Generalmy results
valery trofimov
2:22 24 Feb '09  
I got quite different results for the following code:
cliext::vector z(20);
std::vector zz(20);
for(i = 0;i < 10000000;++i) {
for(j = 0;j < 20;++j)
z[j] = j
}
for(i = 0;i < 10000000;++i) {
for(j = 0;j < 20;++j)
zz[j] = j
}
For cliext::vector 1156 milliseconds,
for std::vector 407 milliseconds.
But with Visual Studio 6 it is done in 203 milliseconds,
exactly the same as with C pointers. The difference is due to
safety checks in VS 2008.

Valery Trofimov.

Generalsource code
clintonvanry
10:24 28 Sep '08  
very good article.
i am interested how you are using stl containers in managed code.
i receive the dread error c4368.
Generalyour results looks wrong
rm822
17:57 9 Aug '08  
my results
stl 172
cli 422
generic 219

test code is bellow

#include "stdafx.h"
#include <vector>
#include <list>
#include <windows.h>
#include <cliext/vector>


template<typename CONTAINER>
class VectorInsertRemove
{
public:
CONTAINER v;
virtual void RunCodeFirstOp(int iterations)
{
for(int count=0; count<iterations; count++)
v.push_back(10);
}
virtual void RunCodeSecondOp(int iterations)
{
for(int count=0; count<iterations; count++)
v.pop_back();
}
};

template<typename CONTAINER>
ref class VectorInsertRemove_Man
{
public:
CONTAINER v;
virtual void RunCodeFirstOp(int iterations)
{
for(int count=0; count<iterations; count++)
v.push_back(10);
}
virtual void RunCodeSecondOp(int iterations)
{
for(int count=0; count<iterations; count++)
v.pop_back();
}
};

ref class GenericListInsertRemove
{
public:
System::Collections::Generic::List<int> v;

virtual void RunCodeFirstOp(int iterations)
{
for(int count=0; count<iterations; count++)
v.Add(10);
}
virtual void RunCodeSecondOp(int iterations)
{
for(int count=0; count<iterations; count++)
v.RemoveAt(v.Count - 1);
}
};


int _tmain(int argc, _TCHAR* argv[])
{
const int itercount = 10000000;
{
VectorInsertRemove< std::vector<int> > vir;
DWORD start = GetTickCount();
vir.RunCodeFirstOp(itercount);
vir.RunCodeSecondOp(itercount);
DWORD end = GetTickCount();
printf("stl %i %i\n", end-start, vir.v.size());
}

{
VectorInsertRemove_Man< cliext::vector<int> > vir;
DWORD start = GetTickCount();
vir.RunCodeFirstOp(itercount);
vir.RunCodeSecondOp(itercount);
DWORD end = GetTickCount();
printf("cli %i %i\n", end-start, vir.v.size());
}

{
GenericListInsertRemove vir;
DWORD start = GetTickCount();
vir.RunCodeFirstOp(itercount);
vir.RunCodeSecondOp(itercount);
DWORD end = GetTickCount();
printf("generic %i %i\n", end-start, vir.v.Count);
}


return 0;
}

Generalstd::list and gcroot
gooja
0:50 28 Jun '08  
Hi,

You seem like the right guy to ask this:
Am I doing something wrong in VS2005, or why cant I use std::list<..>> anymore?

Thing^ thing = gcnew Thing();

std::vector<thing^>> * vector = new std::vector<thing^>>();
vector->push_back(thing);
// compiles fine

std::list<thing^>> * list = new std::list<thing^>>();
list->push_back(thing);
// error C2248: 'gcroot::operator &' :
// cannot access private member declared in class 'gcroot'

Unless I'm mistaken, this worked in VS2003 (with some more __gc and *, but anyway).
Did the std::list implementation change?
Thanks,

Jan-Willem

GeneralChecked?
Nemanja Trifunovic
17:01 9 Mar '08  
If the STL/CLR implementation is checked[^], that alone would explain the poor performance. Heck, I would expect even the native version to be slower than BCL unless the "checking" is off.


GeneralRe: Checked?
Member 1335734
19:03 9 Mar '08  
with "#define _SECURE_SCL 0 ", the same results
GeneralRe: Checked?
Randor
12:09 25 Apr '08  
Try adding this and retest.


#ifndef _DEBUG
#ifdef _SECURE_SCL
#undef _SECURE_SCL
#endif
#ifdef _HAS_ITERATOR_DEBUGGING
#undef _HAS_ITERATOR_DEBUGGING
#endif
#define _SECURE_SCL 0
#define _HAS_ITERATOR_DEBUGGING 0
#endif


Best Wishes,
-David Delaune
GeneralWhy?
Shog9
10:20 8 Mar '08  
First off, nice graphs. Smile

This is rather disturbing though. It's not as though they're a little slower. The difference is *shocking*. I mean, there's no way i'd choose the STL/CLR collections for new code, and i'd be pretty reluctant to port native C++ to managed using them either - it's bad enough that i'd already be taking a hit due to the runtime overhead, this would just be unacceptable.

I expected that at least vector<> would be comparable. I mean, how hard is it to write a reasonably fast dynamic array? But no, it's still much, much worse. Why is there a difference? I coded up a quick implementation of a dynamic array, just to demonstrate to myself that List<> wasn't doing anything tricky behind the scenes... sure enough, i got roughly the same times as List<>. So... wtf? Does the C++/CLI compiler not do inlining? Is it really building each template class and method into real, heavy managed code classes and methods? Suspicious



GeneralRe: Why?
Nishant Sivakumar
11:47 8 Mar '08  
I re-ran the tests on my home laptop and desktop and while the speed difference is still huge, it's not as dramatic as when I ran it from my office machine. I am guessing there are a lot of other factors that come into play too. But the underlying fact remains that the BCL collections are way faster than the STL/CLR ones.

Seeing that for one particular iteration, the BCL version was 10 times faster does not really indicate that in general the BCL classes are 10x faster than STL/CLR. But the fact that every test I did showed the BCL classes to be faster (by varying degrees) can be collectively taken to mean that in general it's much safer and more performant to use the BCL classes.

In fact right now I cannot see any real world usage for STL/CLR other than academic interest. To be fair to Microsoft, I believe STL/CLR was never really completed. It was always behind schedule and I think finally they just decided to release it at the stage it was at, and I don't really expect them to spend more time on it.


GeneralRe: Why?
Rama Krishna Vavilala
14:35 8 Mar '08  
Nishant Sivakumar wrote:
that for one particular iteration, the BCL version was 10 times


The BCL classes are ngened. The same might not be the case with STL/CLR.

Try to ngen them and see if you see any difference.

If you ngen and then test all your graphs may potentially change.

You have, what I would term, a very formal turn of phrase not seen in these isles since the old King passed from this world to the next. martin_hughes on VDK

GeneralRe: Why?
Member 1335734
19:05 9 Mar '08  
after JIT compliing, same results.
GeneralRe: Why?
Wong Shao Voon
21:10 9 Apr '08  
It would be really nice if you could update your article to include native STL benchmark without the /clr compilation switch. I like to know how native STL fare against STL/CLR and BCL collection because I suspect native STL without /clr might be slower than the BCL collection.

OT: I am reading your C++/CLI in action the second time round. Writing any new books at the moment? It's a pleasure reading your C++/CLI book!
GeneralDebug or Release ?
Argiris Kirtzidis
6:18 8 Mar '08  
Are you compiling on Debug or Release ?

If you're on Debug, the results are meaningless; the STL has debug checks slowing it down.
GeneralRe: Debug or Release ?
Nishant Sivakumar
6:36 8 Mar '08  
These are all done on Release builds.


GeneralThe missing stat...
axelriet
4:14 8 Mar '08  
You should add a native code (VS2008) STL stat for reference and comparison.
GeneralRe: The missing stat...
Nishant Sivakumar
4:17 8 Mar '08  
axelriet wrote:
You should add a native code (VS2008) STL stat for reference and comparison.


I believe that wouldn't be a fair comparison Roll eyes
Also I only wanted to test things from a managed context. The idea behind the article is to see whether using STL/CLR in .NET apps has any performance advantages over using the BCL collections.


GeneralRe: The missing stat...
Jim Crafton
13:11 8 Mar '08  
Well I don't know about "fair" but I for one would find it interesting. And like Shog said, the performance difference between the two managed solutions seems *really* weird. What happens if you compare an STL list with a List?

¡El diablo está en mis pantalones! ¡Mire, mire!

Real Mentats use only 100% pure, unfooled around with Sapho Juice(tm)!

SELECT * FROM User WHERE Clue > 0
0 rows returned

Save an Orange - Use the VCF!
VCF Blog

GeneralRe: The missing stat...
Nishant Sivakumar
13:55 8 Mar '08  
Jim Crafton wrote:
What happens if you compare an STL list with a List?


If compiled with /clr, the STL list shows poorer performance. It may more be due to the C++/CLI compiler generating IL from non-CLI code rather than an issue in the STL implementation as such. I didn't really look at the implementation details in detail - I was more focused on running the tests.

Ironically, my initial idea was to write an article showing off STL/CLR's superior performance. I'd have looked such an ass had I announced that prior to submitting the article Smile



Last Updated 8 Mar 2008 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010