I've profiled my application to see why my 3D-vector implementation is almost 3 times slower than the corresponding C function calls: the results proved that every single function call costs more time than the actual arithmetic being performed! I've already cut the number of function calls down to 3, but that doesn't help a lot.
It seems that for some reason, the arithmetic operator calls take even more time than other function calls, and, looking at the disassembly, I've found out why: they are the only functions that haven't been inlined in spite of full optimization! Each call takes ~10 commands of preparation, just for storing the two operands. compared to that, calling the corresponding C function only takes 2 commands to store each double pointer argument.
here's a simplified segment of my code (add include guards as needed):
class VectorExpression3d;
class Vector3d {
public: double x, y, z;
Vector3d(const VectorExpression& ve);
Vector3d& operator=(const vectorExpression3d& ve);
};
#include "vectorexpression3d.h"
#include "vector3d.h"
class VectorExpression3d {
public:
double x, y, z, scale;
VectorExpression3d(const Vector3d& v1, const Vector3d& v2)
: x(v1.x+v2.x), y(v1.y+v2.y), z(v1.z+v2.z), scale(1.0) {}
};
#include "vector3d.h"
inline VectorExpression3d operator+(const Vector3d& v1, const Vector3d& v2) {
return VectorExpression3d(v1, v2);
}
int main() {
Vector3d v1, v2, v3;
v3 = v1+v2; }
I'm using VS 2010, and it seems the compiler ignores inline statements to any of the operators. I know that I cannot force inlining - but it should be possible, and since the operators are trivial, it should even be easy! So what is the problem? Why doesn't VS 2010 inline my operators? Is it not possible after all?
According to my profiling results, the call to operator+ by itself uses up more than half of the total time of the addition statement, including the assignment and construction/Destruction of a temporary!
P.S.:
Maybe this is important, but I forgot to mention that the actual classes are, in fact, templates (only template arg is the base type (double) so far, so not a biggie)