Click here to Skip to main content
Click here to Skip to main content

Stopwatch

, 20 Aug 2014 CPOL
Rate this:
Please Sign up or sign in to vote.
Benchmark C++ std::vector vs raw arrays, move assignable/constructable & copy assignable/constructable

Harlinn Windows on CodePlex[^]

Introduction

We do like benchmarks, don’t we? Developers often seem to be obsessively interested in performance, and it’s not all that uncommon to find that somebody is willing to spend more time on a particular piece of code, than that code will ever spend running.

At other times we really need to satisfy timing requirements, and then it’s pretty handy to have some sort of stopwatch functionality available.

In this article we're going to use a class called Stopwatch to measure the performance characteristics if two classes, Buffer and Buffer2 - the first one is copy constructable and copy assignable, while the latter is also move constructable and move assignable.

The results demonstrate:

  • That using a C++ std::vector<T> is as efficient as using a raw array of the same type.
  • That implementing the move constructor and move assignment operator, can boost the performance of your code, but only when your code is written to take advantage of these features.
  • That std::shared_ptr<T> assignment is very nearly as efficient as raw pointer assignment. Populating the vectors using std::make_shared<T> is as efficient as populating the vectors with the move constructable/assignable type.

If we replace the compare function implemented in the Stopwatch example with something much simpler:

int compare(const Buffer<T>& other )
{
 return ptr[0] - other.ptr[0];
}

we get some really interesting results:

  • Sorting an array of pointers to Buffer<T>& took 0.088100 ms
  • Sorting an array of std::shared_ptr<> took 0.098500 ms
  • Sorting an array of move assignable/constructable elements took 0.122000 ms
  • Sorting an array of copy assignable/constructable elements took 4513.762700 ms

As you see, implementing the move assignment operator and the move constructor gives us an immense performance improvement.

Later we will use full buffer comparition, which is pretty expensive when two buffers contains the same values, since this is what we would normally expect when we compare two buffers.

External libraries

The code relies on the Boost C++ libraries, which can be downloaded from http://www.boost.org[^], so you need to download and build them, before updating the provided projects with include and library paths matching your installation.

Stopwatch

The Stopwatch class is located in hwindatetime.h, along with the DateTime and TimeSpan classes, with an interface thats intentionally similar to .Nets' System.Diagnostics.Stopwatch. Keeping things similar to proven and well documented designs is something I can heartily recommend.

class Stopwatch 
{ 
    long long elapsedTicks;
    long long startedAt;
    bool isRunning;
    static long long frequency; 
public:
    HWIN_EXPORT static const bool IsHighResolution;
private: 
    static double tickFrequency; 
    static bool InitializeStopwatch(); 
    long long GetElapsedDateTimeTicks() const;
public:
    HWIN_EXPORT static long long GetTimestamp();
    HWIN_EXPORT static long long Frequency();

    HWIN_EXPORT Stopwatch();

    HWIN_EXPORT void Start();
    HWIN_EXPORT Stopwatch StartNew();

    HWIN_EXPORT void Stop();
    HWIN_EXPORT void Reset();
    HWIN_EXPORT void Restart(); 
    HWIN_EXPORT bool IsRunning() const;
 
    HWIN_EXPORT TimeSpan Elapsed() const;
    HWIN_EXPORT long long ElapsedMilliseconds() const;
    HWIN_EXPORT long long ElapsedTicks() const;
};

The DateTime and TimeSpan classes are also starting to get more or less feature complete.

Internally the Stopwatch class uses the Windows API QueryPerformanceFrequency[^] and QueryPerformanceCounter[^] functions to provide highly accurate timing information, and if the installed hardware does not support a high-resolution performance counter the Stopwatch class will fall back on the timing information provided by the DateTime class.

Buffer & Buffer2

The test will populate either vectors, arrays, or arrays of pointers - with objects based on the following two template classes.

template<typename T>
class Buffer
{ 
    size_t length; 
    T* ptr;
public:
    Buffer(); 
    Buffer(const Buffer& other); 
    Buffer(const Buffer& first,const Buffer& second);
    Buffer( size_t theLength, const T& theValue);
    ~Buffer(); 
    int compare(const Buffer& other) const;
    bool operator < ( const Buffer& other );
    bool operator > ( const Buffer& other );
    Buffer& operator = ( const Buffer& other ); 
    inline friend Buffer operator + (const Buffer& first,const Buffer& second);
};

The compare function performs, if required, a full comparition of the arrays of type T pointed to by ptr, which is pretty common behaviour fo buffer comparitions - and we're going to use this function, either directly, or indirectly through the < operator to do the sorting perfomed by the tests.

template<typename T>
class Buffer2
{ 
    size_t length; 
    T* ptr;
public:
    Buffer2(); 
    Buffer2(const Buffer2& other); 
    Buffer2(Buffer2&& other); 
    Buffer2(const Buffer2& first,const Buffer2& second);
    Buffer2( size_t theLength, const T& theValue); 
    ~Buffer2(); 
    int compare(const Buffer2& other) const;
    bool operator < ( const Buffer2& other );
    bool operator > ( const Buffer2& other );
    Buffer2& operator = ( const Buffer2& other ); 
    Buffer2& operator = ( Buffer2&& other ); 
    inline friend Buffer2 operator + (const Buffer2& first,const Buffer2& second);
};

Using the Stopwatch class

Using the Stopwatch class is pretty easy:

void Buffer2TestPushBack(int bufferCount, size_t bufferSize) 
{
    typedef Buffer2 <char> buffer_t;
    Stopwatch stopwatch;
    vector< buffer_t > buffers;	
    buffers.reserve(bufferCount);

You just need to declare an instance, stopwatch, and you're able to get highly accurate timing information.

To start the stopwatch, you just need to call Start(), and then perform the operations you want timing information for:

    printf("Test vector push_back: move constructible/move assignable\n");
    stopwatch.Start();

    for(int i = 0; i < bufferCount; i++) 
    {
        char c = i%24;
        buffers.push_back(buffer_t(size_t(bufferSize),c));
    }

before calling Stop() to stop the stopwatch.

    stopwatch.Stop();

    printf("\tpush_back of %d elements in %f ms\n", bufferCount ,
         stopwatch.Elapsed().TotalMilliseconds() );

Elapsed() returns a TimeSpan representing the interval which has elapsed between your call to Start() and Stop(). We use the TotalMilliseconds() function of the TimeSpan class to retrieve the timing information as a double.

A Stopwatch can be restarted using the Restart() function:

    stopwatch.Restart();
    sort(buffers.begin(),buffers.end());
    stopwatch.Stop();
    printf("\tsorted %d elements in %f ms\n\n", bufferCount ,
         stopwatch.Elapsed().TotalMilliseconds() );

So reusing an existing Stopwatch is easy.

The tests

Since we want to perform sorting is customary to ensure that the data is ordered in the same way, while still causing the sort to actually perform some operations on our vectors and arrays. We ensure this by

for(int i = 0; i < bufferCount; i++) 
{
    char c = i%24;
    buffers.push_back(buffer_t(size_t(bufferSize),c));
}

filling the buffer at the same position in the vectors or arrays, with values based on a modulus operation on i. buffer_t is either a typedef Buffer <char> buffer_t or typedef Buffer2 <char> buffer_t. The size of the buffer will in all cases be 1843200 bytes (640*480*8) and the vectors or arrays will have a length of 500.

The first test uses a vector < Buffer <char> >

Test vector push_back: copy constructible/copy assignable
        push_back of 500 elements in 1222.143000 ms
        sorted 500 elements in 7062.886100 ms

While the next uses two vector < Buffer <char> > instances, and copies elements from one to the other.

Test vector copy: copy constructible/copy assignable
        copy of 500 elements in 609.315100 ms
        sorted 500 elements in 6947.904100 ms

The next test uses a vector < Buffer2 <char> > and since Buffer2 <char> is move assignable/consructable we see a decent improvement in performance since Buffer2 <char> elements created by buffers.push_back(buffer_t(size_t(bufferSize),c)); immediately goes out of scope at the end of the expression, making it an rvalue; which lets the compiler use the move constructor and move assignment operator to forward instance data to the vector.

Test vector push_back: move constructible/move assignable
        push_back of 500 elements in 482.005300 ms
        sorted 500 elements in 2628.228400 ms

As we can see, sorting also sees a fair performance gain.

It's worth noting that move semantics does not provide us with any performance improvement when we use two vector < Buffer2 <char> > instances, and copies elements from one to the other. This is because the source element does not go out of scope when we perform the

for(int i = 0; i < bufferCount; i++) 
{
    destinationBuffers[i] = sourceBuffers[i];
}

assignment from source to destination.

Test vector move constructible/move assignable
        copy of 500 elements in 610.708300 ms
        sorted 500 elements in 2624.473200 ms

For the next test we use a vector < Buffer <char>* >, and the timing for the sort really tells us something valuable about move semantics as the value is nearly identical to the two previous tests. Sorting a vector of a type that implements the move constructor and the move assignment operator, is, in our case, as efficient as sorting a vector of pointers.

Test vector of raw pointers
        push_back of 500 pointers in 509.466600 ms
        sorted 500 pointers in 2618.940200 ms

Assigning from one vector < Buffer <char>* > to another is fast.

Test vector of raw pointers
        copy of 500 pointers in 0.001900 ms
        sorted 500 pointers in 2630.240200 ms

So what about std::shared_ptr?

Test vector: push_back of shared_ptrs'
        push_back of 500 shared_ptrs' in 480.064000 ms
        sorted 500 shared_ptrs' in 2617.170900 ms

Using vector< shared_ptr<Buffer<char> > > is about as fast as using a vector < Buffer2 <char> > - which shows that std::shared_ptr is blazingly fast, particularly when we assign one std::shared_ptr to another:

Test vector: Copy of shared_ptrs'
        copied 500 shared_ptrs' in 0.009600 ms
        sorted 500 shared_ptrs' in 2629.669700 ms

What if we use two arrays of Buffer<char>* ? Turns out we got the same performance with vector < Buffer <char>* >.

Test array of raw pointers
        copy of 500 pointers in 0.001900 ms
        sorted 500 pointers in 2626.685100 ms

And if we use two arrays of Buffer<char> we get results similar to the performance we got with vector < Buffer <char> >.

Test array: copy constructible/copy assignable
        copy of 500 elements in 613.808700 ms
        sorted 500 elements in 7110.636500 ms

Did using a lambda

sort(first,last, [] (const buffer_t& a, const buffer_t& b) { return a.compare(b) < 0;});

impact one the result? Nope ..., not particularly surprising, but well worth knowing.

Test array: Copy constructible/copy assignable (no lambda)
        copy of 500 elements in 623.144300 ms
        sorted 500 elements in 7121.363300 ms
Test array: Copy of move constructible/move assignable (no lambda)
        copy of 500 elements in 611.294200 ms
        sorted 500 elements in 2621.753100 ms

Neither does there seem to be any difference when we use an array of std::shared_ptr.

Test array: shared_ptrs'
        copy of 500 shared_ptrs' in 0.009200 ms
        sorted 500 shared_ptrs' in 2617.208300 ms

Conclusion

There doesn't seem to be any benefit to using raw arrays in place of std::vector, which is something that's good to know, and once you have created a std::shared_ptr, passing it around is very efficient too.

History

  • 16th of October, 2012 - Initial posting
  • 20th of October, 2012 - A few bugfixes combined with extensive changes to the library
  • 20th of October, 2012 - Library updates
  • 20th of August, 2014 - More than a few updates and bug-fixes

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Espen Harlinn
Architect Powel AS
Norway Norway
Chief Architect - Powel AS.
 
Specializing in integrated operations and high performance computing solutions.
 
I’ve been fooling around with computers since the early eighties, I’ve even done work on CP/M and MP/M.
 
Wrote my first “real” program on a BBC micro model B based on a series in a magazine at that time. It was fun and I got hooked on this thing called programming ...
 
A few Highlights:
  • High performance application server development
  • Model Driven Architecture and Code generators
  • Real-Time Distributed Solutions
  • C, C++, C#, Java, TSQL, PL/SQL, Delphi, ActionScript, Perl, Rexx
  • Microsoft SQL Server, Oracle RDBMS, IBM DB2, PostGreSQL
  • AMQP, Apache qpid, RabbitMQ, Microsoft Message Queuing, IBM WebSphereMQ, Oracle TuxidoMQ
  • Oracle WebLogic, IBM WebSphere
  • Corba, COM, DCE, WCF
  • AspenTech InfoPlus.21(IP21), OsiSoft PI
 
More information about what I do for a living can be found at: harlinn.com or LinkedIn
 
You can contact me at espen.harlinn@powel.no

Comments and Discussions

 
GeneralMy vote of 3 PinmemberPrasyee18-Oct-12 3:59 
GeneralRe: My vote of 3 PinmvpEspen Harlinn29-Dec-12 23:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.141030.1 | Last Updated 20 Aug 2014
Article Copyright 2012 by Espen Harlinn
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid