Introduction And Acknowledgements
This article is inspired by Herb Sutter's Guru Of The
Week Article GotW#80 entitled : Order, Order !
Herb Sutter's article expounds the importance of knowing the sequence
order of object creation. He does it by posing a small code fragment which
contains one or more bugs and challenges readers to find out where the
error lies.
Please read Herb's piece at the following link :
Please note that this article is targeted at beginner
to intermediate level C++ developers. Old timers may find some
of the materials presented below rather blasé :-)
My intension is not to provide a rehash of Herb's column. I want
to go beyond the issues discussed by Herb and further
analyse other relevant and important aspects.
I have included some alterations to Herb Sutter's original quiz code (in
the Explanations section below) and analysed their impact. I will also give
some appreciation of vtables and provide an example of the
importance of timing when using them.
I will also discuss and present compiler-emitted code and show how
they are related to the C++ code.
This article is meant to be an augmentation of Herb's original work
as a token of respect to the great man. I had enjoyed using his code to learn so
many things in C++ that I thought it would be good to share these with others.
Spot The Bug !
Following Herb's style, we present below a small code fragment
which contains one or more bugs. The reader is challenged to spot the bug
before proceeding to the explanation sections below :
1 #include <string>
2 using namespace std;
3
4 class A
5 {
6 public:
7 A( const string& s ) { }
8 string f() { return "hello, world"; }
9 };
10
11 class B : public A
12 {
13 public:
14 B() : A( s = f() ) {}
15 private:
16 string s;
17 };
18
19 int main()
20 {
21 B b;
22 return 0;
23 }
The above program defines a class named A from which the class named B is
derived.
A has a parameterised constructor that takes a reference to a string 's'.
This string 's' is not used inside A's constructor.
A also has a function f()
which returns a string set to "hello, world".
B has a constructor that calls the parameterised constructor of A.
B's constructor supplies its own string 'B::s' as the parameter to A's
constructor. But before 'B::s' is passed in, it is initialized to the return
string from A::f()
.
When you compile and run the above code, the program will crash. What causes
the crash ?
Can you spot the bug ? Give this quiz a try. The answers are in Herb Sutter's
article page (web link supplied above) and is also explained below.
Explanations
Well folks, did you manage to find out where the bug is ? The solution to the
quiz is explained below :
1. The Bug
The bug lies at line 14 :
14 B() : A( s = f() ) {}
There are 2 problems contained in this line. Each problem has to do with
object lifetime and the use of objects before they exist. The first problem is
the one that causes the crash. The second problem is not the reason for the
crash but has the potential to cause one, and is more interesting as we shall
see later on.
Before we explore into the 2 issues, note first that the expression "s = f()"
is used as the argument to the A base subobject constructor. This expression
will therefore be executed before the A base subobject (or any part of the B
object) is constructed.
The two problems are :
1.1 The (ab)use of the B::s
string object : in that
B::s
is used before it
has been constructed.
1.2 The (ab)use of A::f()
: in that the member function
f()
is being called
on an A subobject that hasn't yet been constructed. As mentioned, this issue
does not actually cause the crash of the program. However, the use of A::f()
in
this way is bad practice.
We shall now explore these two issues in greater depth.
2. The (ab)use of the B::s string object.
2.1 This issue is the cause of the crash. Let's examine the problematic line
14 again :
14 B() : A( s = f() ) {}
Here, the "s" is a string object contained inside "b" (i.e. "b::s").
2.2 By calling
s = f()
the following sequence of action takes place :
2.2.1 A::f() is invoked.
2.2.2 A temporary string (containing "hello, world") is created somewhere in
memory.
2.2.3 The assignment operator function for string (i.e. string::operator=()
)
is invoked for "b::s". The parameter for the assignment operator function is the
temporary string returned from A::f().
2.3 The crash occurs when the "b::s" string object's assignment operator
function (i.e. string::operator=()
) gets invoked. This is due to the fact that
"b::s" has not been constructed at this point and so the calling of
string::operator=()
is done on a string object that does not exist. An ensuing
crash is not hard to imagine.
2.4 Even if we had tried to construct "b::s" as part of the construction
process of the B subobject, e.g. :
11 class B : public A
12 {
13 public:
14 B() : s("some string"), A( s = f() ) {}
15 private:
16 string s;
17 };
it will still be of no use. The reason has to do with the order of
construction of base subobjects and member objects.
[Please read Herb Sutter's excellent example code showing the precise
order of creation (of C++ base subobjects and member objects) in his Guru Of The
Week GotW#80 article (web link is supplied above)]
The C++ standard stipulates the order of construction and the A subobject in
"b" will always take precedence to any member objects of B.
Hence A::A(const string& s)
will get invoked before s("some string"), and
we will end up with the same crash.
2.5 To resolve the crash, we can re-write the source codes this way :
Original Code :
14
B() : A( s = f() ) {}
Modified Code 1 :
14 B() : A( f() ) { s = f(); }
or
Modified Code 2 :
14
B() : s(f()), A( f() ) { }
Both modified sets of source codes may not be very efficient but they do
avoid the crash and the original intent of the code (which is to initialise A
and b::s) are kept intact. Both codes avoid initialising "b::s" before the A
subobject has been constructed and "b::s" is constructed and initialized after
the A subobject has been constructed.
3. The (ab)use of A::f()
3.1 Note that the problem is not with the function A::f()
itself. When the
statement :
s = f()
is executed, the code for the function
A::f()
is already in existence.
It is the compiler's and the linker's job to ensure that all referenced
global, static and member functions have their code emitted for an executable
image file (.DLL or .EXE). So regardless of the existence of any A (or A-derived)
object, A::f() is already somewhere in memory when a program is
started.
3.2 Note also that the problem is not with "b"'s "this" pointer.
Back to :
s = f()
when A::f()
is executed here, the value of "b"'s "this" pointer is stored
into the ECX register as usual. This is good, expected, and there are no
problems. The "b" object's "this" pointer is valid even if "b" itself is not
completely constructed.
For a constructor function call like the following :
B() : A( s = f() ) {}
the Visual C++ 6.0 compiler emits the following assembly code :
B() : A( s = f() ) {}
00401080 push ebp
00401081 mov ebp,esp
...
...
...
004010BF mov ecx,dword ptr [this]
004010C2 call A::f (00401150)
004010C7 mov dword ptr [ebp-28h],eax
004010CA mov ecx,dword ptr [ebp-28h]
The code of interest to us is at address 004010BF, where the "this" pointer
is stored inside the ECX register just before the member function A::f() is
called. This is typical of the Visual C++ compiler's emitted code for member
function calling, i.e. the "this" pointer is passed to the ECX register just
prior to the actual member function call.
Note, in code location 004010C2 that the call to A::f() is made -directly-;
i.e., the compiler already knows the function address of A::f() and this is used
-immediately-.
3.3 The problem is with the complications that may arise if virtual functions
are declared (in class A or class B) and their -possible- usage as part
of the construction process.
The situation will be different if class A or class B declares any virtual
functions. In the context of our example code, "b"'s "this" pointer will point
to a memory location which stores the address of the "virtual function table" of
the "b" object.
The question of whether the A or B subobject has been constructed properly
(for "b") becomes important.
3.4 Let's modify our original source codes and see an example :
M1 #include <string>
M2 using namespace std;
M3
M4 class A
M5 {
M6 public:
M7 A( const string& s ) { }
M8
M9 string f()
M10 {
M11 g();
M12 return "hello, world";
M13 }
M14
M15 virtual void g()
M16 {
M17 }
M18 };
M19
M20 class B : public A
M21 {
M22 public:
M23 B() : A( s = f() ) {}
M24
M25 private:
M26 string s;
M27 };
M28
M29 int main()
M30 {
M31 B b;
M32 return 0;
M33 }
The difference between our modified code and the original lies in the
inclusion of a virtual function A::g() (lines M15 through M17), and in the
calling of A::g()
in A::f()
(line M11). The rest of the modified code are the
same as the original.
3.5 Let us re-examine the part of the code where "b" is constructed :
M23
B() : A( s = f() ) {}
Like the original code, A::f()
is invoked :
M9 string f()
M10 {
M11 g();
M12 return "hello, world";
M13 }
The Visual C++ 6.0 compiler will emit the following code
for function
A::f()
:
M9 string f()
M10 {
00401160 push ebp
00401161 mov ebp,esp
00401163 sub esp,0Ch
00401166 push esi
00401167 mov dword ptr [ebp-0Ch],0CCCCCCCCh
0040116E mov dword ptr [ebp-8],0CCCCCCCCh
00401175 mov dword ptr [ebp-4],0CCCCCCCCh
0040117C mov dword ptr [ebp-0Ch],ecx
0040117F mov dword ptr [ebp-8],0
M11 g()
00401186 mov eax,dword ptr [this]
00401189 mov edx,dword ptr [eax]
0040118B mov esi,esp
0040118D mov ecx,dword ptr [this]
00401190 call dword ptr [edx]
00401192 cmp esi,esp
00401194 call __chkesp (00402290)
M12 return "hello, world"
00401199 lea eax,[ebp-4]
0040119C push eax
0040119D push offset string "hello, world" (00411040)
004011A2 mov ecx,dword ptr [__$ReturnUdt]
004011A5 call std::basic_string<char,std::char_traits<char>,
std::allocator<char> >::basic_string<char,
004011AA mov ecx,dword ptr [ebp-8]
004011AD or ecx,1
004011B0 mov dword ptr [ebp-8],ecx
004011B3 mov eax,dword ptr [__$ReturnUdt]
M13 }
Let's zoom into code locations 00401186 through 00401190 where
A::g()
is
called.
M11 g()
00401186 mov eax,dword ptr [this]
00401189 mov edx,dword ptr [eax]
0040118B mov esi,esp
0040118D mov ecx,dword ptr [this]
00401190 call dword ptr [edx]
The "this" pointer value is first moved into EAX. Next, the value contained
in the memory area pointed to by EAX is moved into EDX. Then at address
00401190, we call the function whose address is contained in EDX.
What these statements mean is that the virtual function table pointer, which
is contained in the memory area pointed to by "b"'s "this" pointer, is
referenced to determine the address of the virtual function A::g(). This is
illustrated below :
b's this points to
(== 0x0012ff6c) ----> +----------------------+ points to
|address of B's virtual function table |---+
+--------------------------------------+ |
|
|
|
|
|
V
B's virtual function table
+------------------+
| Address of A:g() |
+------------------+
3.6 Now, in normal circumstances, "b"'s "this" pointer will point to a memory
location that contains a valid pointer to "b"'s virtual function table. The
virtual function table will then contain the address(es) of "b"'s virtual
function(s).
Now, if "b" is not completely constructed, its pointer
to that vtable will not be initialized yet. In this case, "b"'s "this"
pointer will point to a memory location which contains an invalid address (which
is supposed to point to "b"'s vtable). This is illustrated below :
points to
b's this (== 0x0012ff6c) -----> +--------------+ points to
| garbage data 1 |----+
+------------------------+|
|
|
|
|
|
V
Random Memory Area but treated as
B's virtual function table
+------------------+
| garbage data 2 |
+------------------+
[Please note a side information, brought up by Sergei (a
member of CodeProject) which is significant : the vtable of a class is
usually constructed by a compiler at compile time and would then already be
present in memory at program startup.
Object instances of the class will only need to initialize
their pointers to this vtable during object construction.
Furthermore, two object instances of a class that uses virtual
functions would share the same vtable in memory. ]
Now, back to code locations 00401186 through 00401190 where A::g()
is called
:
M11 g()
00401186 mov eax,dword ptr [this]
00401189 mov edx,dword ptr [eax]
0040118B mov esi,esp
0040118D mov ecx,dword ptr [this]
00401190 call dword ptr [edx]
EDX will effectively contain "garbage data 1".
Then at code location 00401190, we invoke the instruction :
00401190 call dword ptr [edx]
What dword ptr [edx] means is that we invoke the first virtual function of
the virtual function table pointed to by EDX. Since what EDX points to is
garbage, an ensuing crash is not difficult to imagine.
In Conclusion
I certainly hope that you have benefited from our discussions above. Remember always that C++ is a great language with wonderful innate features.
But with these great facilities come the price of complexity. To use object
containment, you have to remember the order of construction of individual
contained member objects.
I certainly hope that you had fun and that I have done Herb proud !
Best Regards, Bio.
Lim Bio Liong is a Specialist at a leading Software House in Singapore.
Bio has been in software development for over 10 years. He specialises in C/C++ programming and Windows software development.
Bio has also done device-driver development and enjoys low-level programming. Bio has recently picked up C# programming and has been researching in this area.