Click here to Skip to main content
15,918,168 members
Please Sign up or sign in to vote.
5.00/5 (6 votes)
I've profiled my application to see why my 3D-vector implementation is almost 3 times slower than the corresponding C function calls: the results proved that every single function call costs more time than the actual arithmetic being performed! I've already cut the number of function calls down to 3, but that doesn't help a lot.

It seems that for some reason, the arithmetic operator calls take even more time than other function calls, and, looking at the disassembly, I've found out why: they are the only functions that haven't been inlined in spite of full optimization! Each call takes ~10 commands of preparation, just for storing the two operands. compared to that, calling the corresponding C function only takes 2 commands to store each double pointer argument.

here's a simplified segment of my code (add include guards as needed):
// header vector3d.h
class VectorExpression3d;
class Vector3d {
public: // will see about visibility later...
   double x, y, z;
   Vector3d(const VectorExpression& ve);
   Vector3d& operator=(const vectorExpression3d& ve);
#include "vectorexpression3d.h"
// implementation ...

// header vectorexpression3d.h
#include "vector3d.h"
class VectorExpression3d {
   double x, y, z, scale;
   VectorExpression3d(const Vector3d& v1, const Vector3d& v2)
     : x(v1.x+v2.x), y(v1.y+v2.y), z(v1.z+v2.z), scale(1.0) {}

// main cpp file
#include "vector3d.h"
inline VectorExpression3d operator+(const Vector3d& v1, const Vector3d& v2) {
   return VectorExpression3d(v1, v2);

int main() {
  // ...
  Vector3d v1, v2, v3;
  v3 = v1+v2; // invokes non-inlined call to operator+ above, 
              // then inlined(!) VectorExpression3d constructor
              // then inlined(!) Vector3d constructor
              // then inlined VectorExpression3d destructor
  // ...

I'm using VS 2010, and it seems the compiler ignores inline statements to any of the operators. I know that I cannot force inlining - but it should be possible, and since the operators are trivial, it should even be easy! So what is the problem? Why doesn't VS 2010 inline my operators? Is it not possible after all?

According to my profiling results, the call to operator+ by itself uses up more than half of the total time of the addition statement, including the assignment and construction/Destruction of a temporary!

Maybe this is important, but I forgot to mention that the actual classes are, in fact, templates (only template arg is the base type (double) so far, so not a biggie)
Updated 21-Mar-13 2:49am
Philippe Mori 21-Mar-13 9:21am    
By the way, I assume that you are compiling the release version (full optimization).

If the C++ version is more than pushing 2 pointers on the stack, maybe there is something "wrong" with the code that prevent the compiler from optimize it or doing some extra operations. Do you have default constructors? Do you have destructors? Do you have virtual functions?
Stefan_Lang 21-Mar-13 10:07am    
Not quite: I changed the default options of the debug version to produce code optimized for speed, but still retain debugging information, so I could look at and step through the actual code and see the assembly generated from each statement.
nv3 21-Mar-13 9:27am    
What I would try just out of curiosity is to replace the operator+ by a "normal" function name and see if it's just a phenomenon that occurs with operator overloading, although I doubt that.

BTW: Are we talking C++/CLI or unmanaged code? You are speaking of so-and-so many commands being executed. That made me suspicious. Or are you referring to instructions and just picked the wrong word?

Good question in any case and good analysis work you are doing. That deserved my 5.
Stefan_Lang 21-Mar-13 10:34am    
This is unmanaged code, and what I meant was assembler instructions. I was never good at vocabulary, sorry for the confusion ;-)

Try moving your operator+ implementation to the header file. VS doesn't like inlining things from .cpp files ( this might cause linkage issues down the line so be careful ) . You could also try __forceinline instead of inline but it may not make any difference, that's still up to the compiler.

If none of this works could you use a specialist constructor on VectorExpression3d which takes two instances and produces a 3rd. It may be awkward and a bit less math-like but it may solve the inlining issue. Inlining constructors is pretty much basic optimization strategy.
Share this answer
Stefan_Lang 21-Mar-13 8:47am    
A good idea, but unfortunately it didn't work. Maybe it's got to do with the classes being templates - I'll go try without templates then...

I didn't get your second suggestion: I don't see how it could affect the assembly of operator+. Could you elaborate?
Matthew Faithfull 21-Mar-13 10:48am    
You may be right about templates, tricky beasts.
As to the second suggestion it's similar to Eugen's solution but no inheritance is required, just use an overloaded constructor that takes 2 existing instances by reference, and possibly a tag type to indicate what to do with them, and produces a third instance the same as the + operator. Then don't use the + operator at all. I realise this breaks the natural way of doing things but it depends how much you want that performance.
Stefan_Lang 21-Mar-13 11:06am    
Ah, so instead of v1+v2 I'd write something like VE(v1, v2, tADD). I suppose that'd work, but it beats the original idea of writing vector equations in a natural way...
Matthew Faithfull 21-Mar-13 11:11am    
Yes, I'm as confident as I can be that even VS2010 will inline that if the constructor is all the header file and marked as inline ( I might mark it explicit as well ). It certainly isn't natural, not to a mathemetician anyway but it's an imperfect world sometimes, even in C++.
Stefan_Lang 22-Mar-13 6:00am    
Ok, starting from the significantly reduced example code I posted above, I got my code to work as intended. It seems that the key is moving the operator into the header file as you suggested. Although my original implementation seems to be suffering from another problem as well (still not inlining), I'll accept your solution as answer: at the very least I now have a working implementation that I can start from.
By the way, you should generally define operator + in term of operator += as that one can be made more efficient and the other can be trivially be implemented in term of already defined operator +=.

class Vector3d
  Vector3d& operator+=(const Vector3d &other)
    x += other.x;
    y += other.y;
    z += other.z;

// Notice that + operator is not a member function and does not need to be friend
inline Vector3d operator+(const Vector3d &lhs, const Vector3d &rhs)
  return Vector3d(lhs) += rhs;
Share this answer
Stefan_Lang 21-Mar-13 8:36am    
Oh I do have all the arithmetic assignment operators that you can think of, including the slightly tricky operator^= (cross product). As I said, the code segment is a simplification (the classes are also templates in case you wondered about the in-header implementation I hinted at).

The problem, as stated, is that operator+ takes a staggering amount of time if not inlined, and the underlying implementation is atm pretty much irrelevant.
Right, I finally found the problem in my original source code. There were actually two issues:

1. I needed to move the definition of operator+ to a header (created a separate one for all the operator definitions to come). Thanks to Matthew Faithfull for pointing this out.

2. The original version of my VectorExpression3d class was generated by an UML tool, and it had automatically added a (empty) destructor. After removing that destructor, the compiler was finally able to inline my operator+ ! I'm not quite sure, but I suspect the destructor prevented the compiler from treating the class like a POD type...

In any case, thanks for all your suggestions. If nothing else, they kept my head spinning and helped me move towards the solution.
Share this answer
Hm... :)

The "secure" inline style (implementation in the class body) would be also possible,
after the following separation:
// IVector.h
class IVector
  virtual double GetX() const = 0;
  virtual double GetY() const = 0;
  virtual double GetZ() const = 0;

  virtual double GetF() const = 0;

// VectorExpr.h
#include "IVector.h"
class VectorExpr
  VectorExpr(const IVector& v1, const IVectror& v2)
    // use their Get-Methods here

// Vector.h
#include "VectorExpr.h"
class Vector : public IVector
  double x, y, z, f;

  virtual double GetX() const { return x; }
  virtual double GetY() const { return y; }
  virtual double GetZ() const { return z; }
  virtual double GetF() const { return f; }

  VectorExpr operator+(const Vector& other)
    return VectorExpr(*this, other);
Share this answer
Philippe Mori 21-Mar-13 9:13am    
This won't help for performance which is the main reason user wants to inline its code.
Eugen Podsypalnikov 21-Mar-13 9:17am    
Maybe... It is just shown how to make an operator "pure" inline :)
Stefan_Lang 21-Mar-13 10:33am    
I deliberately avoided inheritance and the associated cost of the vtable (inheritance requires virtual destructor!).

I do know how to write elegenat code, but the only elegance I am concerned about right now, is the ability to write vector equations in a natural way without the usual cost to performance compared to a straightforward C implementation.
Eugen Podsypalnikov 21-Mar-13 10:42am    
Hi Stefan !
Could you test this model for the perfomance too ? :)
Stefan_Lang 21-Mar-13 11:32am    
Already did that several years ago: my current implementation works that way. It's nice for code that isn't performance-critical, but many of our algorithms are, so we're constantly copying vectors back and forth between objects and C arrays - a rather dissatisfactory scenario.

My research is based on the realization that classes are bad for performance, whenever inheritance is involved. It's not even the indirection of resolving function calls at run time via the vtable - it is the huge amount of temporary or local objects that need constructing and destructing! You can considerably speed up the allocation of memory for objects on the heap by using a pool. But you cannot speed up the construction of a vtable!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900