The Importance of Sequence Order in the Creation of C++ Objects.






4.41/5 (10 votes)
This article demonstrates the importance of the sequence order of object creation in C++
Introduction And Acknowledgements
This article is inspired by Herb Sutter's Guru Of The Week Article GotW#80 entitled : Order, Order !
Herb Sutter's article expounds the importance of knowing the sequence order of object creation. He does it by posing a small code fragment which contains one or more bugs and challenges readers to find out where the error lies.
Please read Herb's piece at the following link :
Please note that this article is targeted at beginner to intermediate level C++ developers. Old timers may find some of the materials presented below rather blasé :-)
My intension is not to provide a rehash of Herb's column. I want to go beyond the issues discussed by Herb and further analyse other relevant and important aspects.
I have included some alterations to Herb Sutter's original quiz code (in the Explanations section below) and analysed their impact. I will also give some appreciation of vtables and provide an example of the importance of timing when using them.
I will also discuss and present compiler-emitted code and show how they are related to the C++ code.
This article is meant to be an augmentation of Herb's original work as a token of respect to the great man. I had enjoyed using his code to learn so many things in C++ that I thought it would be good to share these with others.
Spot The Bug !
Following Herb's style, we present below a small code fragment which contains one or more bugs. The reader is challenged to spot the bug before proceeding to the explanation sections below :
1 #include <string> 2 using namespace std; 3 4 class A 5 { 6 public: 7 A( const string& s ) { /* ... */ } 8 string f() { return "hello, world"; } 9 }; 10 11 class B : public A 12 { 13 public: 14 B() : A( s = f() ) {} 15 private: 16 string s; 17 }; 18 19 int main() 20 { 21 B b; 22 return 0; 23 }
The above program defines a class named A from which the class named B is derived.
A has a parameterised constructor that takes a reference to a string 's'. This string 's' is not used inside A's constructor.
A also has a function f()
which returns a string set to "hello, world".
B has a constructor that calls the parameterised constructor of A.
B's constructor supplies its own string 'B::s' as the parameter to A's
constructor. But before 'B::s' is passed in, it is initialized to the return
string from A::f()
.
When you compile and run the above code, the program will crash. What causes the crash ?
Can you spot the bug ? Give this quiz a try. The answers are in Herb Sutter's article page (web link supplied above) and is also explained below.
Explanations
Well folks, did you manage to find out where the bug is ? The solution to the quiz is explained below :
1. The Bug
The bug lies at line 14 :
14 B() : A( s = f() ) {}
There are 2 problems contained in this line. Each problem has to do with object lifetime and the use of objects before they exist. The first problem is the one that causes the crash. The second problem is not the reason for the crash but has the potential to cause one, and is more interesting as we shall see later on.
Before we explore into the 2 issues, note first that the expression "s = f()" is used as the argument to the A base subobject constructor. This expression will therefore be executed before the A base subobject (or any part of the B object) is constructed.
The two problems are :
1.1 The (ab)use of the B::s
string object : in that
B::s
is used before it
has been constructed.
1.2 The (ab)use of A::f()
: in that the member function
f()
is being called
on an A subobject that hasn't yet been constructed. As mentioned, this issue
does not actually cause the crash of the program. However, the use of A::f()
in
this way is bad practice.
We shall now explore these two issues in greater depth.
2. The (ab)use of the B::s string object.
2.1 This issue is the cause of the crash. Let's examine the problematic line 14 again :
14 B() : A( s = f() ) {}
Here, the "s" is a string object contained inside "b" (i.e. "b::s").
2.2 By calling
s = f()
the following sequence of action takes place :
2.2.1 A::f() is invoked.
2.2.2 A temporary string (containing "hello, world") is created somewhere in memory.
2.2.3 The assignment operator function for string (i.e. string::operator=()
)
is invoked for "b::s". The parameter for the assignment operator function is the
temporary string returned from A::f().
2.3 The crash occurs when the "b::s" string object's assignment operator
function (i.e. string::operator=()
) gets invoked. This is due to the fact that
"b::s" has not been constructed at this point and so the calling of
string::operator=()
is done on a string object that does not exist. An ensuing
crash is not hard to imagine.
2.4 Even if we had tried to construct "b::s" as part of the construction process of the B subobject, e.g. :
11 class B : public A 12 { 13 public: 14 B() : s("some string"), A( s = f() ) {} /* Try to construct & initialise B::s, hopefully before A::A().*/ 15 private: 16 string s; 17 };
it will still be of no use. The reason has to do with the order of construction of base subobjects and member objects.
[Please read Herb Sutter's excellent example code showing the precise order of creation (of C++ base subobjects and member objects) in his Guru Of The Week GotW#80 article (web link is supplied above)]
The C++ standard stipulates the order of construction and the A subobject in "b" will always take precedence to any member objects of B.
Hence A::A(const string& s)
will get invoked before s("some string"), and
we will end up with the same crash.
2.5 To resolve the crash, we can re-write the source codes this way :
Original Code :
14
B() : A( s = f() ) {}
Modified Code 1 :
14 B() : A( f() ) { s = f(); }
or
Modified Code 2 :
14
B() : s(f()), A( f() ) { }
Both modified sets of source codes may not be very efficient but they do avoid the crash and the original intent of the code (which is to initialise A and b::s) are kept intact. Both codes avoid initialising "b::s" before the A subobject has been constructed and "b::s" is constructed and initialized after the A subobject has been constructed.
3. The (ab)use of A::f()
3.1 Note that the problem is not with the function A::f()
itself. When the
statement :
s = f()
is executed, the code for the function
A::f()
is already in existence.
It is the compiler's and the linker's job to ensure that all referenced global, static and member functions have their code emitted for an executable image file (.DLL or .EXE). So regardless of the existence of any A (or A-derived) object, A::f() is already somewhere in memory when a program is started.
3.2 Note also that the problem is not with "b"'s "this" pointer.
Back to :
s = f()
when A::f()
is executed here, the value of "b"'s "this" pointer is stored
into the ECX register as usual. This is good, expected, and there are no
problems. The "b" object's "this" pointer is valid even if "b" itself is not
completely constructed.
For a constructor function call like the following :
B() : A( s = f() ) {}
the Visual C++ 6.0 compiler emits the following assembly code :
B() : A( s = f() ) {}
00401080 push ebp
00401081 mov ebp,esp
...
...
...
004010BF mov ecx,dword ptr [this]
004010C2 call A::f (00401150)
004010C7 mov dword ptr [ebp-28h],eax
004010CA mov ecx,dword ptr [ebp-28h]
The code of interest to us is at address 004010BF, where the "this" pointer is stored inside the ECX register just before the member function A::f() is called. This is typical of the Visual C++ compiler's emitted code for member function calling, i.e. the "this" pointer is passed to the ECX register just prior to the actual member function call.
Note, in code location 004010C2 that the call to A::f() is made -directly-; i.e., the compiler already knows the function address of A::f() and this is used -immediately-.
3.3 The problem is with the complications that may arise if virtual functions are declared (in class A or class B) and their -possible- usage as part of the construction process.
The situation will be different if class A or class B declares any virtual functions. In the context of our example code, "b"'s "this" pointer will point to a memory location which stores the address of the "virtual function table" of the "b" object.
The question of whether the A or B subobject has been constructed properly (for "b") becomes important.
3.4 Let's modify our original source codes and see an example :
M1 #include <string> M2 using namespace std; M3 M4 class A M5 { M6 public: M7 A( const string& s ) { /* ... */ } M8 M9 string f() M10 { M11 g(); // Calling virtual function A::g() in function A::f(). M12 return "hello, world"; M13 } M14 M15 virtual void g() // Definition of a virtual function A::g(). M16 { M17 } M18 }; M19 M20 class B : public A M21 { M22 public: M23 B() : A( s = f() ) {} M24 M25 private: M26 string s; M27 }; M28 M29 int main() M30 { M31 B b; M32 return 0; M33 }
The difference between our modified code and the original lies in the
inclusion of a virtual function A::g() (lines M15 through M17), and in the
calling of A::g()
in A::f()
(line M11). The rest of the modified code are the
same as the original.
3.5 Let us re-examine the part of the code where "b" is constructed :
M23
B() : A( s = f() ) {}
Like the original code, A::f()
is invoked :
M9 string f() M10 { M11 g(); // Calling virtual function A::g() in function A::f(). M12 return "hello, world"; M13 }The Visual C++ 6.0 compiler will emit the following code for function
A::f()
:M9 string f()
M10 {
00401160 push ebp
00401161 mov ebp,esp
00401163 sub esp,0Ch
00401166 push esi
00401167 mov dword ptr [ebp-0Ch],0CCCCCCCCh
0040116E mov dword ptr [ebp-8],0CCCCCCCCh
00401175 mov dword ptr [ebp-4],0CCCCCCCCh
0040117C mov dword ptr [ebp-0Ch],ecx
0040117F mov dword ptr [ebp-8],0
M11 g();
00401186 mov eax,dword ptr [this]
00401189 mov edx,dword ptr [eax]
0040118B mov esi,esp
0040118D mov ecx,dword ptr [this]
00401190 call dword ptr [edx]
00401192 cmp esi,esp
00401194 call __chkesp (00402290)
M12 return "hello, world";
00401199 lea eax,[ebp-4]
0040119C push eax
0040119D push offset string "hello, world" (00411040)
004011A2 mov ecx,dword ptr [__$ReturnUdt]
004011A5 call std::basic_string<char,std::char_traits<char>,
std::allocator<char> >::basic_string<char,
004011AA mov ecx,dword ptr [ebp-8]
004011AD or ecx,1
004011B0 mov dword ptr [ebp-8],ecx
004011B3 mov eax,dword ptr [__$ReturnUdt]
M13 }
Let's zoom into code locations 00401186 through 00401190 where
A::g()
is
called.
M11 g();
00401186 mov eax,dword ptr [this]
00401189 mov edx,dword ptr [eax]
0040118B mov esi,esp
0040118D mov ecx,dword ptr [this]
00401190 call dword ptr [edx]
The "this" pointer value is first moved into EAX. Next, the value contained in the memory area pointed to by EAX is moved into EDX. Then at address 00401190, we call the function whose address is contained in EDX.
What these statements mean is that the virtual function table pointer, which is contained in the memory area pointed to by "b"'s "this" pointer, is referenced to determine the address of the virtual function A::g(). This is illustrated below :
b's this points to
(== 0x0012ff6c) ----> +----------------------+ points to
|address of B's virtual function table |---+
+--------------------------------------+ |
|
|
|
|
|
V
B's virtual function table
+------------------+
| Address of A:g() |
+------------------+
3.6 Now, in normal circumstances, "b"'s "this" pointer will point to a memory location that contains a valid pointer to "b"'s virtual function table. The virtual function table will then contain the address(es) of "b"'s virtual function(s).
Now, if "b" is not completely constructed, its pointer to that vtable will not be initialized yet. In this case, "b"'s "this" pointer will point to a memory location which contains an invalid address (which is supposed to point to "b"'s vtable). This is illustrated below :
points to
b's this (== 0x0012ff6c) -----> +--------------+ points to
| garbage data 1 |----+
+------------------------+|
|
|
|
|
|
V
Random Memory Area but treated as
B's virtual function table
+------------------+
| garbage data 2 |
+------------------+
[Please note a side information, brought up by Sergei (a member of CodeProject) which is significant : the vtable of a class is usually constructed by a compiler at compile time and would then already be present in memory at program startup.
Object instances of the class will only need to initialize their pointers to this vtable during object construction. Furthermore, two object instances of a class that uses virtual functions would share the same vtable in memory. ]
Now, back to code locations 00401186 through 00401190 where A::g()
is called
:
M11 g();
00401186 mov eax,dword ptr [this]
00401189 mov edx,dword ptr [eax]
0040118B mov esi,esp
0040118D mov ecx,dword ptr [this]
00401190 call dword ptr [edx]
EDX will effectively contain "garbage data 1".
Then at code location 00401190, we invoke the instruction :
00401190 call dword ptr [edx]
What dword ptr [edx] means is that we invoke the first virtual function of the virtual function table pointed to by EDX. Since what EDX points to is garbage, an ensuing crash is not difficult to imagine.
In Conclusion
I certainly hope that you have benefited from our discussions above. Remember always that C++ is a great language with wonderful innate features. But with these great facilities come the price of complexity. To use object containment, you have to remember the order of construction of individual contained member objects.
I certainly hope that you had fun and that I have done Herb proud ! Best Regards, Bio.