The Importance of Sequence Order in the Creation of C++ Objects.

Lim Bio Liong

4.41/5 (10 votes)

Aug 29, 2004

CPOL

9 min read

45551

217

This article demonstrates the importance of the sequence order of object creation in C++

Download source files - 2.37 Kb

Introduction And Acknowledgements

This article is inspired by Herb Sutter's Guru Of The Week Article GotW#80 entitled : Order, Order !

Herb Sutter's article expounds the importance of knowing the sequence order of object creation. He does it by posing a small code fragment which contains one or more bugs and challenges readers to find out where the error lies.

Please read Herb's piece at the following link :

http://www.gotw.ca/gotw/080.htm

Please note that this article is targeted at beginner to intermediate level C++ developers. Old timers may find some of the materials presented below rather blasé :-)

My intension is not to provide a rehash of Herb's column. I want to go beyond the issues discussed by Herb and further analyse other relevant and important aspects.

I have included some alterations to Herb Sutter's original quiz code (in the Explanations section below) and analysed their impact. I will also give some appreciation of vtables and provide an example of the importance of timing when using them.

I will also discuss and present compiler-emitted code and show how they are related to the C++ code.

This article is meant to be an augmentation of Herb's original work as a token of respect to the great man. I had enjoyed using his code to learn so many things in C++ that I thought it would be good to share these with others.

Spot The Bug !

Following Herb's style, we present below a small code fragment which contains one or more bugs. The reader is challenged to spot the bug before proceeding to the explanation sections below :

1 #include <string>
2 using namespace std;
3 
4 class A
5 {
6 public:
7 A( const string& s ) { /* ... */ }
8 string f() { return "hello, world"; }
9 };
10 
11 class B : public A
12 {
13 public:
14 B() : A( s = f() ) {}
15 private:
16 string s;
17 };
18 
19 int main()
20 {
21 B b;
22 return 0;
23 }

The above program defines a class named A from which the class named B is derived.

A has a parameterised constructor that takes a reference to a string 's'. This string 's' is not used inside A's constructor.

A also has a function f() which returns a string set to "hello, world".

B has a constructor that calls the parameterised constructor of A.

B's constructor supplies its own string 'B::s' as the parameter to A's constructor. But before 'B::s' is passed in, it is initialized to the return string from A::f().

When you compile and run the above code, the program will crash. What causes the crash ?

Can you spot the bug ? Give this quiz a try. The answers are in Herb Sutter's article page (web link supplied above) and is also explained below.

Explanations

Well folks, did you manage to find out where the bug is ? The solution to the quiz is explained below :

1. The Bug

The bug lies at line 14 :

14 B() : A( s = f() ) {}

There are 2 problems contained in this line. Each problem has to do with object lifetime and the use of objects before they exist. The first problem is the one that causes the crash. The second problem is not the reason for the crash but has the potential to cause one, and is more interesting as we shall see later on.

Before we explore into the 2 issues, note first that the expression "s = f()" is used as the argument to the A base subobject constructor. This expression will therefore be executed before the A base subobject (or any part of the B object) is constructed.

The two problems are :

1.1 The (ab)use of the B::s string object : in that B::s is used before it has been constructed.

1.2 The (ab)use of A::f() : in that the member function f() is being called on an A subobject that hasn't yet been constructed. As mentioned, this issue does not actually cause the crash of the program. However, the use of A::f() in this way is bad practice.

We shall now explore these two issues in greater depth.

2. The (ab)use of the B::s string object.

2.1 This issue is the cause of the crash. Let's examine the problematic line 14 again :

14 B() : A( s = f() ) {}

Here, the "s" is a string object contained inside "b" (i.e. "b::s").

2.2 By calling

s = f()

the following sequence of action takes place :

2.2.1 A::f() is invoked.

2.2.2 A temporary string (containing "hello, world") is created somewhere in memory.

2.2.3 The assignment operator function for string (i.e. string::operator=()) is invoked for "b::s". The parameter for the assignment operator function is the temporary string returned from A::f().

2.3 The crash occurs when the "b::s" string object's assignment operator function (i.e. string::operator=()) gets invoked. This is due to the fact that "b::s" has not been constructed at this point and so the calling of string::operator=() is done on a string object that does not exist. An ensuing crash is not hard to imagine.

2.4 Even if we had tried to construct "b::s" as part of the construction process of the B subobject, e.g. :

11  class B : public A
12  {
13  public:
14    B() : s("some string"), A( s = f() ) {} 
      /* Try to construct & initialise B::s, hopefully before A::A().*/
15  private:
16    string s;
17  };

it will still be of no use. The reason has to do with the order of construction of base subobjects and member objects.

[Please read Herb Sutter's excellent example code showing the precise order of creation (of C++ base subobjects and member objects) in his Guru Of The Week GotW#80 article (web link is supplied above)]

The C++ standard stipulates the order of construction and the A subobject in "b" will always take precedence to any member objects of B.

Hence A::A(const string& s) will get invoked before s("some string"), and we will end up with the same crash.

2.5 To resolve the crash, we can re-write the source codes this way :

Original Code :

14 B() : A( s = f() ) {}

Modified Code 1 :

14 B() : A( f() ) { s = f(); }

Modified Code 2 :

14 B() : s(f()), A( f() ) { }

Both modified sets of source codes may not be very efficient but they do avoid the crash and the original intent of the code (which is to initialise A and b::s) are kept intact. Both codes avoid initialising "b::s" before the A subobject has been constructed and "b::s" is constructed and initialized after the A subobject has been constructed.

3. The (ab)use of A::f()

3.1 Note that the problem is not with the function A::f() itself. When the statement :

s = f()

is executed, the code for the function A::f() is already in existence.

It is the compiler's and the linker's job to ensure that all referenced global, static and member functions have their code emitted for an executable image file (.DLL or .EXE). So regardless of the existence of any A (or A-derived) object, A::f() is already somewhere in memory when a program is started.

3.2 Note also that the problem is not with "b"'s "this" pointer.

Back to :

s = f()

when A::f() is executed here, the value of "b"'s "this" pointer is stored into the ECX register as usual. This is good, expected, and there are no problems. The "b" object's "this" pointer is valid even if "b" itself is not completely constructed.

For a constructor function call like the following :

B() : A( s = f() ) {}

the Visual C++ 6.0 compiler emits the following assembly code :

B() : A( s = f() ) {}
00401080   push        ebp
00401081   mov         ebp,esp
...
...
...
004010BF   mov         ecx,dword ptr [this]
004010C2   call        A::f (00401150)
004010C7   mov         dword ptr [ebp-28h],eax
004010CA   mov         ecx,dword ptr [ebp-28h]

The code of interest to us is at address 004010BF, where the "this" pointer is stored inside the ECX register just before the member function A::f() is called. This is typical of the Visual C++ compiler's emitted code for member function calling, i.e. the "this" pointer is passed to the ECX register just prior to the actual member function call.

Note, in code location 004010C2 that the call to A::f() is made -directly-; i.e., the compiler already knows the function address of A::f() and this is used -immediately-.

3.3 The problem is with the complications that may arise if virtual functions are declared (in class A or class B) and their -possible- usage as part of the construction process.

The situation will be different if class A or class B declares any virtual functions. In the context of our example code, "b"'s "this" pointer will point to a memory location which stores the address of the "virtual function table" of the "b" object.

The question of whether the A or B subobject has been constructed properly (for "b") becomes important.

3.4 Let's modify our original source codes and see an example :

M1  #include <string>
M2  using namespace std;
M3
M4  class A
M5  {
M6    public:
M7      A( const string& s ) { /* ... */ }
M8
M9      string f()
M10     {
M11      g();  // Calling virtual function A::g() in function A::f().
M12      return "hello, world";
M13     }
M14
M15     virtual void g()  // Definition of a virtual function A::g().
M16     {
M17     }
M18 };
M19
M20 class B : public A
M21 {
M22   public:
M23     B() : A( s = f() ) {}
M24
M25   private:
M26     string s;
M27 };
M28
M29 int main()
M30 {
M31   B b;
M32   return 0;
M33 }

The difference between our modified code and the original lies in the inclusion of a virtual function A::g() (lines M15 through M17), and in the calling of A::g() in A::f() (line M11). The rest of the modified code are the same as the original.

3.5 Let us re-examine the part of the code where "b" is constructed :

M23 B() : A( s = f() ) {}

Like the original code, A::f() is invoked :

M9    string f()
M10   {
M11    g();  // Calling virtual function A::g() in function A::f().
M12    return "hello, world";
M13   }

The Visual C++ 6.0 compiler will emit the following code for function A::f() :

M9   string f()
M10  {
00401160   push        ebp
00401161   mov         ebp,esp
00401163   sub         esp,0Ch
00401166   push        esi
00401167   mov         dword ptr [ebp-0Ch],0CCCCCCCCh
0040116E   mov         dword ptr [ebp-8],0CCCCCCCCh
00401175   mov         dword ptr [ebp-4],0CCCCCCCCh
0040117C   mov         dword ptr [ebp-0Ch],ecx
0040117F   mov         dword ptr [ebp-8],0
M11     g();
00401186   mov         eax,dword ptr [this]
00401189   mov         edx,dword ptr [eax]
0040118B   mov         esi,esp
0040118D   mov         ecx,dword ptr [this]
00401190   call        dword ptr [edx]
00401192   cmp         esi,esp
00401194   call        __chkesp (00402290)
M12     return "hello, world";
00401199   lea         eax,[ebp-4]
0040119C   push        eax
0040119D   push        offset string "hello, world" (00411040)
004011A2   mov         ecx,dword ptr [__$ReturnUdt]
004011A5   call        std::basic_string<;char,std::char_traits<char>,
               std::allocator<;char> >::basic_string<char,
004011AA   mov         ecx,dword ptr [ebp-8]
004011AD   or          ecx,1
004011B0   mov         dword ptr [ebp-8],ecx
004011B3   mov         eax,dword ptr [__$ReturnUdt]
M13   }

Let's zoom into code locations 00401186 through 00401190 where A::g() is called.

M11     g();
00401186   mov         eax,dword ptr [this] 
00401189   mov         edx,dword ptr [eax]  
0040118B   mov         esi,esp
0040118D   mov         ecx,dword ptr [this]
00401190   call        dword ptr [edx]

The "this" pointer value is first moved into EAX. Next, the value contained in the memory area pointed to by EAX is moved into EDX. Then at address 00401190, we call the function whose address is contained in EDX.

What these statements mean is that the virtual function table pointer, which is contained in the memory area pointed to by "b"'s "this" pointer, is referenced to determine the address of the virtual function A::g(). This is illustrated below :

b's this       points to
(== 0x0012ff6c) ----> +----------------------+  points to
                          |address of B's virtual function table |---+
                          +--------------------------------------+   |
                                                                     |
                                                                     |
                                                                     |
                                                                     |
                                                                     |
                                                                     V
                                                  B's virtual function table
                                                       +------------------+
                                                       | Address of A:g() |
                                                       +------------------+

3.6 Now, in normal circumstances, "b"'s "this" pointer will point to a memory location that contains a valid pointer to "b"'s virtual function table. The virtual function table will then contain the address(es) of "b"'s virtual function(s).

Now, if "b" is not completely constructed, its pointer to that vtable will not be initialized yet. In this case, "b"'s "this" pointer will point to a memory location which contains an invalid address (which is supposed to point to "b"'s vtable). This is illustrated below :

                          points to
b's this (== 0x0012ff6c) -----> +--------------+  points to
                                |   garbage data 1        |----+
                                +------------------------+|
                                                               |
                                                               |
                                                               |
                                                               |
                                                               |
                                                               V
                                  Random Memory Area but treated as
                                  B's virtual function table
                                                         +------------------+
                                                         |  garbage data 2  |
                                                         +------------------+

[Please note a side information, brought up by Sergei (a member of CodeProject) which is significant : the vtable of a class is usually constructed by a compiler at compile time and would then already be present in memory at program startup.

Object instances of the class will only need to initialize their pointers to this vtable during object construction. Furthermore, two object instances of a class that uses virtual functions would share the same vtable in memory. ]

Now, back to code locations 00401186 through 00401190 where A::g() is called :

M11     g();
00401186   mov         eax,dword ptr [this] 
00401189   mov         edx,dword ptr [eax]  
0040118B   mov         esi,esp
0040118D   mov         ecx,dword ptr [this]
00401190   call        dword ptr [edx]

EDX will effectively contain "garbage data 1".

Then at code location 00401190, we invoke the instruction :

00401190   call        dword ptr [edx]

What dword ptr [edx] means is that we invoke the first virtual function of the virtual function table pointed to by EDX. Since what EDX points to is garbage, an ensuing crash is not difficult to imagine.

In Conclusion

I certainly hope that you have benefited from our discussions above. Remember always that C++ is a great language with wonderful innate features. But with these great facilities come the price of complexity. To use object containment, you have to remember the order of construction of individual contained member objects.

I certainly hope that you had fun and that I have done Herb proud ! Best Regards, Bio.