Click here to Skip to main content
14,456,102 members

Robust C++: Initialization and Restarts

Rate this:
5.00 (2 votes)
Please Sign up or sign in to vote.
5.00 (2 votes)
11 Feb 2020GPL3
Structuring main() and recovering from memory corruption
In a large system, main() can easily become a mess as different developers add their initialization code. This article presents a Module class that allows a system to be initialized in a structured, layered manner. It then evolves the design to show how the system can perform a quick restart, rather than a reboot, to recover from serious errors such as trampled memory.


In many C++ programs, the main function #includes the world and utterly lacks structure. This article describes how to initialize a system in a structured manner. It then discusses how to evolve the design to support recovery from serious errors (usually corrupted memory) by quickly reinitializing a subset of the system instead of having to reboot its executable.

Using the Code

The code in this article is taken from the Robust Services Core (RSC), a large repository that provides a framework for developing robust C++ applications. RSC's software is organized into static libraries, each in its own namespace. Much of the code excerpted in this article comes from the namespace NodeBase in the nb directory. NodeBase contains about 50K lines of code that provide base classes for things such as:

  • system initialization/reinitialization (the focus of this article)
  • configuration parameters
  • multi-threading
  • object pooling
  • CLI commands
  • logging
  • debugging tools

Although RSC is targeted at Windows, it has an abstraction layer that should allow it to be ported to other platforms with modest effort. The Windows targets (in *.win.cpp files) currently comprise less than 3K lines of code.

If you don't want to use RSC, you can copy and modify its source code to meet your needs, subject to the terms of its GPL-3.0 license.

RSC contains many details that are not relevant to this article, so the code that we look at will be excerpted from the relevant classes and functions, but with irrelevant details removed. Many of these details are nonetheless important and need to be considered if your approach is to copy and modify RSC software.

In most cases, RSC defines each class in a .h of the same name and implements it in a .cpp of the same name. You should therefore be able to easily find the full version of each class.

Initializing the System

We'll start by looking at how RSC initializes when the system boots up.


Each Module subclass represents a set of interrelated source code files that provides some logical capability. Each of these subclasses is responsible for:

  • specifying the other modules on which it depends
  • initializing the set of source code files that it represents when the executable is launched

In RSC's current implementation, each Module subclass corresponds 1-to-1 with a static library. This has worked well and is therefore unlikely to change. Dependencies between static libraries must be defined before building an executable, so it is easy to apply the same dependencies among modules. And since no static library is very large, each module can easily initialize the static library to which it belongs.

A module's specifies its dependencies in its constructor and initializes its static library in its Startup function. Here is the outline of a typical module:

class SomeModule : public Module
   friend class Singleton< SomeModule >;
   SomeModule() : Module()
      //  Modules 1 to N are the ones on which this module depends.
      //  Creating their singletons ensures that they will exist in
      //  the module registry when the system initializes. Because
      //  each module creates the modules on which it depends before
      //  it adds itself to the registry, the registry will contain
      //  modules in the (partial) order of their dependencies.
      Singleton< Module1 >::Instance();
      //  ...
      Singleton< ModuleN >::Instance();
      Singleton< ModuleRegistry >::Instance()->BindModule(*this);

   ~SomeModule() = default;
   void Startup() override;  // details are specific to each module

If each module's constructor instantiates the modules on which it depends, how are leaf modules created? The answer is that main creates them. The code for main will appear soon.


The singleton ModuleRegistry appeared in the last line of the above constructor. It contains all of the system's modules, sorted by their dependencies (a partial ordering). ModuleRegistry also has a Startup function that initializes the system by invoking Startup on each module.

Thread, RootThread, and InitThread

In RSC, each thread derives from the base class Thread, which encapsulates a native thread and provides a variety of functions related to things like scheduling and inter-thread communication.

The first thread that RSC creates is RootThread, which wraps the native thread that the C++ run-time system created to run main. RootThread simply brings the system up to the point where it can create the next thread. That thread, InitThread, is responsible for initializing most of the system. Once initialization is complete, InitThread acts as a watchdog to ensure that threads are being scheduled, and RootThread acts as a watchdog to ensure that InitThread is running.


After it echoes and saves any command line arguments, main simply instantiates leaf modules. RSC currently has 15 static libraries and, therefore, 15 modules. Modules that are instantiated transitively are commented out:

main_t main(int argc, char* argv[])
   //  Echo and save the arguments.
   std::cout << "ENTERING main(int argc, char* argv[])" << CRLF;
   std::cout << "  argc: " << argc << CRLF;

   auto reg = Singleton< CfgParmRegistry >::Instance();

   for(auto i = 0; i < argc; ++i)
      string arg(argv[i]);
      std::cout << "  argv[" << i << "]: " << arg << CRLF;

   std::cout << std::flush;

   //  Instantiate the desired modules.
// Singleton< NbModule >::Instance();
// Singleton< NtModule >::Instance();
   Singleton< CtModule >::Instance();
// Singleton< NwModule >::Instance();
// Singleton< SbModule >::Instance();
// Singleton< StModule >::Instance();
// Singleton< MbModule >::Instance();
// Singleton< CbModule >::Instance();
// Singleton< PbModule >::Instance();
   Singleton< OnModule >::Instance();
   Singleton< CnModule >::Instance();
   Singleton< RnModule >::Instance();
   Singleton< SnModule >::Instance();
   Singleton< AnModule >::Instance();
// Singleton< DipModule >::Instance();

   return RootThread::Main();

Once the system has initialized, entering the >modules command on the CLI displays the following, which is the order in which the modules were invoked to initialize their static libraries:

  this : 003B0660
  // stuff deleted
  modules [ModuleId]
    size     : 14
    // stuff deleted
    registry : 003B06A0
      [1]: 003B0640 NodeBase.NbModule
      [2]: 003B0E88 NodeTools.NtModule
      [3]: 003B0620 CodeTools.CtModule
      [4]: 003B0F08 NetworkBase.NwModule
      [5]: 003B0EE8 SessionBase.SbModule
      [6]: 003B0EC8 ControlNode.CnModule
      [7]: 003B0F68 SessionTools.StModule
      [8]: 003B0F88 MediaBase.MbModule
      [9]: 003B0F48 CallBase.CbModule
      [10]: 003B0F28 PotsBase.PbModule
      [11]: 003B0EA8 OperationsNode.OnModule
      [12]: 003B0FA8 RoutingNode.RnModule
      [13]: 003B0FC8 ServiceNode.SnModule
      [14]: 003B0FF0 AccessNode.AnModule

If an application built on RSC does not require a particular static library, the instantiation of its module can be commented out, and the linker will exclude all of that library's code from the executable.

main is the only code implemented outside a static library. It resides in the rsc directory, whose only source code file is main.cpp. All other software, whether part of the framework or an application, resides in a static library.


The last thing that main did was invoke RootThread::Main, which is a static function because RootThread has not yet been instantiated. Its job is to create the things that are needed to actually instantiate RootThread:

main_t RootThread::Main()
   //  This loop is hypothetical because our Enter function (invoked
   //  through Thread::EnterThread and Thread::Start) never returns.
      //  Load symbol information.

      //  Create the POSIX signals.  They are needed now so that
      //  RootThread can register for signals when it is wrapped.

      //  Create the object pool for threads.
      auto pool = Singleton< ThreadPool >::Instance();
      if(!pool->AllocBlocks()) return SystemOutOfMemory;

      //  Wrap the root thread and enter it.
      auto root = Singleton< RootThread >::Instance();

Invoking Thread::EnterThread leads to the invocation of RootThread::Enter, which implements RootThread's thread loop. RootThread::Enter creates InitThread, whose first task is to finish initializing the system. RootThread then goes to sleep, running a watchdog timer that is cancelled when InitThread interrupts RootThread to tell it that the system has been initialized. If the timer expires, the system failed to initialize: it is embarrassingly dead on arrival, so RootThread exits.


To finish initializing the system, InitThread invokes ModuleRegistry::Startup. This function invokes each module's Startup function. It also records how long it took to initialize each module, code that has been deleted for clarity:

void ModuleRegistry::Startup()
   for(auto m = modules_.First(); m != nullptr; modules_.Next(m))

Once this function is finished, something very similar to this will have appeared on the console:

ENTERING main(int argc, char* argv[])
  argc: 1
  argv[0]: C:\Users\gregu\Documents\rsc\rsc\Debug\rsc.exe

MODULE INITIALIZATION          msecs      invoked at
pre-Module.Startup               433    08:28:00.212
NodeBase.NbModule...                    08:28:00.645
...initialized                    62
NodeTools.NtModule...                   08:28:00.717
...initialized                    18
CodeTools.CtModule...                   08:28:00.743
...initialized                    18
NetworkBase.NwModule...                 08:28:00.770

    NET500 2-Aug-2019 08:28:00.785 on Reigi {1}
...initialized                   121
SessionBase.SbModule...                 08:28:00.900
...initialized                    90
SessionTools.StModule...                08:28:01.001
...initialized                    12
MediaBase.MbModule...                   08:28:01.024
...initialized                    13
CallBase.CbModule...                    08:28:01.047
...initialized                    20
PotsBase.PbModule...                    08:28:01.078
...initialized                    17
OperationsNode.OnModule...              08:28:01.106
...initialized                    10
ControlNode.CnModule...                 08:28:01.127
...initialized                    11
RoutingNode.RnModule...                 08:28:01.149
...initialized                    11
ServiceNode.SnModule...                 08:28:01.170
...initialized                    35
AccessNode.AnModule...                  08:28:01.218
...initialized                    16
total initialization time       1035

    NODE500 2-Aug-2019 08:28:01.260 on Reigi {2}

A Module::Startup Function

Module Startup functions aren't particularly interesting. One of RSC's design principles is that objects needed to process user requests should be created during system initialization, so as to provide predictable latency once the system is in service. Here is the Startup code for NbModule, which initializes the namespace NodeBase:

void NbModule::Startup()
   //  Create/start singletons.  Some of these already exist as a
   //  result of creating RootThread, but their Startup functions
   //  must be invoked.
   Singleton< PosixSignalRegistry >::Instance()->Startup();
   Singleton< StatisticsRegistry >::Instance()->Startup();
   Singleton< LogBufferRegistry >::Instance()->Startup();
   Singleton< AlarmRegistry >::Instance()->Startup();
   Singleton< LogGroupRegistry >::Instance()->Startup();
   Singleton< CfgParmRegistry >::Instance()->Startup();
   Singleton< DaemonRegistry >::Instance()->Startup();
   Singleton< ObjectPoolRegistry >::Instance()->Startup();
   Singleton< ThreadRegistry >::Instance()->Startup();
   Singleton< ThreadAdmin >::Instance()->Startup();
   Singleton< ThreadPool >::Instance()->Startup();
   Singleton< MsgBufferPool >::Instance()->Startup();
   Singleton< ClassRegistry >::Instance()->Startup();
   Singleton< Element >::Instance()->Startup();
   Singleton< CliRegistry >::Instance()->Startup();
   Singleton< SymbolRegistry >::Instance()->Startup();
   Singleton< NbIncrement >::Instance()->Startup();

   //  Create/start threads.
   Singleton< FileThread >::Instance()->Startup();
   Singleton< CoutThread >::Instance()->Startup();
   Singleton< CinThread >::Instance()->Startup();
   Singleton< ObjectPoolAudit >::Instance()->Startup();
   Singleton< StatisticsThread >::Instance()->Startup();
   Singleton< LogThread >::Instance()->Startup();
   Singleton< CliThread >::Instance()->Startup();

Restarting the System

So far, we have an initialization framework with the following characteristics:

  • a structured and layered approach to initialization
  • a simple main that only needs to create leaf modules
  • ease of excluding a static library from the build by not instantiating the module that initializes it

We will now enhance this framework so that we can reinitialize the system to recover from serious errors. Robust C++ : Safety Net describes how to do this for an individual thread. But sometimes a system gets into a state where the types of errors described in that article recur. In such a situation, more drastic action is required. Quite often, some data has been corrupted, and fixing it will restore the system to health. A partial reinitialization of the system, short of a complete reboot, can often do exactly that.

If we can initialize the system in a layered manner, we should also be able to shut it down in a layered manner. We can define Shutdown functions to complement the Startup functions that we've already seen. However, we only want to perform a partial shutdown, followed by a partial startup to recreate the things that the shutdown phase destroyed. If we can do that, we will have achieved a partial reinitialization.

But what, exactly, should we destroy and recreate? Some things are easily recreated. Other things will take much longer, during which time the system will be unavailable. It is therefore best to use a flexible strategy. If the system is in trouble, start by reinitializing what can be recreated quickly. If that doesn't fix the problem, broaden the scope of what gets reinitialized, and so on. Eventually, we'll have to give up and reboot.

Our restart (reinitialization) strategy therefore escalates. RSC supports three levels of restart whose scopes are less than a full reboot. When the system gets into trouble, it tries to recover by initiating the restart with the narrowest scope. But if it soon gets into trouble again, it increases the scope of the next restart:

  • A warm restart destroys temporary data and also exits and recreates as many threads as possible. Any user request currently being processed is lost and must be resubmitted.
  • A cold restart also destroys dynamic data, which is data that changes while processing user requests. All sessions, for example, are lost and must be reinitiated.
  • A reload restart also destroys data that is relatively static, such as configuration data that user requests rarely modify. This data is usually loaded from disk or over the network, two examples being an in-memory database of user profiles and another of images that are included in server-to-client HTTP messages.

Startup and Shutdown functions therefore need a parameter that specifies what type of restart is occurring:

enum RestartLevel
   RestartNil,     // in service (not restarting)
   RestartWarm,    // deleting MemTemp and exiting threads
   RestartCold,    // warm plus deleting MemDyn (user sessions)
   RestartReload,  // cold plus deleting MemProt (configuration data)
   RestartReboot,  // exiting and restarting executable
   RestartExit,    // exiting without restarting
   RestartLevel_N  // number of restart levels

Deleting Objects During a Restart

Because the goal of a restart is to reinitialize a subset of the system as quickly as possible, RSC takes a drastic approach. Rather than delete objects one at a time, it simply frees the heap from which they were allocated. In a system with tens of thousands of sessions, for example, this dramatically speeds up the time required for a cold restart. The drawback is that it adds some complexity because each type of memory requires its own heap:

MemoryType Base Class Attributes
MemTemp Temporary does not survive any restart
MemDyn Dynamic survives warm restarts but not cold or reload restarts
MemProt Protected write-protected; survives warm and cold restarts but not reload restarts
MemPerm Permanent survives all restarts (this is a wrapper for the C++ default heap)
MemImm Immutable write-protected; survives all restarts (similar to C++ global const data)

To use a given MemoryType, a class derives from the corresponding class in the Base Class column. How this works is described later.

Write-Protecting Data

The above table notes that MemProt is write-protected.1 The rationale for this is that data which is only deleted during a reload restart is expensive to recreate, because it must be loaded from disk or over the network. The data also changes far less frequently than other data. It is therefore prudent but not cost-prohibitive to protect it from trampling.

During system initialization, MemProt is unprotected. Just before it starts to handle user requests, the system write-protects MemProt. Applications must then explicitly unprotect and reprotect it in order to modify data whose memory was allocated from its heap. Only during a reload restart is it again unprotected, while recreating this data.

A second type of write-protected memory, MemImm, is defined for the same reason. It contains critical data that should never change, such as the Module subclasses and ModuleRegistry. Once the system has initialized, it is permanently write-protected.

A Module::Shutdown Function

A module's Shutdown function closely resembles its Startup function. It invokes Shutdown on objects within its static library, but in the opposite order to which it invoked their Startup functions. Here is the Shutdown function for NbModule, which is (more or less) a mirror image of its Startup function that appeared earlier:

void NbModule::Shutdown(RestartLevel level)
   Singleton< NbIncrement >::Instance()->Shutdown(level);
   Singleton< SymbolRegistry >::Instance()->Shutdown(level);
   Singleton< CliRegistry >::Instance()->Shutdown(level);
   Singleton< Element >::Instance()->Shutdown(level);
   Singleton< ClassRegistry >::Instance()->Shutdown(level);
   Singleton< MsgBufferPool >::Instance()->Shutdown(level);
   Singleton< ThreadPool >::Instance()->Shutdown(level);
   Singleton< ThreadAdmin >::Instance()->Shutdown(level);
   Singleton< ThreadRegistry >::Instance()->Shutdown(level);
   Singleton< ObjectPoolRegistry >::Instance()->Shutdown(level);
   Singleton< DaemonRegistry >::Instance()->Shutdown(level);
   Singleton< CfgParmRegistry >::Instance()->Shutdown(level);
   Singleton< LogGroupRegistry >::Instance()->Shutdown(level);
   Singleton< AlarmRegistry >::Instance()->Shutdown(level);
   Singleton< LogBufferRegistry >::Instance()->Shutdown(level);
   Singleton< StatisticsRegistry >::Instance()->Shutdown(level);
   Singleton< PosixSignalRegistry >::Instance()->Shutdown(level);

   Singleton< TraceBuffer >::Instance()->Shutdown(level);

Given that a restart frees one or more heaps rather than expecting objects on those heaps to be deleted, what is the purpose of a Shutdown function? The answer is that an object which survives the restart might have pointers to objects that will be destroyed or recreated. Its Shutdown function might therefore need to reset these pointers.

NbModule's Startup function created a number of threads, so how come its Shutdown function doesn't shut them down? The reason is that ModuleRegistry::Shutdown handles this earlier in the restart. We will look at this later.

Supporting Memory Types

This section discusses what is needed to support a MemoryType, each of which has its own persistence and protection characteristics.


Each MemoryType requires its own heap so that all of its objects can be deleted en masse by simply freeing that heap during the appropriate types of restart. Heap management is platform specific, so RSC defines the class SysHeap (in SysHeap.h) to act as a wrapper for platform-specific heap functions.

The interface Memory.h is used to allocate and free the various types of memory. Its primary functions are similar to malloc and free, with the various heaps being private to Memory.cpp:

//  Allocates a memory segment of nBytes of the specified TYPE.  If
//  EX is true, an AllocationException is thrown on failure.
static void* Alloc(size_t nBytes, MemoryType type, bool ex = true);

//  Deallocates the memory segment returned by Alloc.
static void Free(const void* addr);

Base Classes

A class whose objects can be allocated dynamically derives from Temporary, Dynamic, Protected, Permanent, or Immutable, as previously mentioned. If it doesn't do so, its objects are allocated from the default heap, which is equivalent to deriving from Permanent.

These classes simply override operator new to use the appropriate heap. For example:

void* Dynamic::operator new(size_t size)
   return Memory::Alloc(size, MemDyn);

void* Dynamic::operator new[](size_t size)

   return Memory::Alloc(size, MemDyn);

Memory::Alloc places a header above each block of memory that it allocates. This header records the block's size and the heap from which it was allocated. Memory::Free can then access this header and return the block to the correct heap. The overrides of operator delete that support the various types of memory can therefore be implemented by Object, which is the common base class for the classes previously mentioned in this section:

void Object::operator delete(void* addr)

void Object::operator delete[](void* addr)

void Object::operator delete(void* addr, MemoryType type)

void Object::operator delete[](void* addr, MemoryType type)


A class with a std::string member wants the string to allocate memory from the same heap that is used for objects of that class. If the string instead allocates memory from the default heap, a restart will leak memory when the object's heap is freed. Although the restart will free the memory used by string object itself, its destructor will not be invoked, and so the memory that it allocated to hold its characters will leak.

RSC must therefore provide a C++ allocator for each MemoryType so that a class whose objects are not allocated on the default heap can use classes from the standard library. These allocators are defined in Allocators.h and are used to define STL classes that allocate memory from the desired heap. For example:

typedef std::char_traits<char> CharTraits;
typedef std::basic_string<char, CharTraits, DynAllocator<char>> DynString;

A class derived from Dynamic then uses DynString to declare what would normally have been a std::string member.

Initiating a Restart

A restart occurs as follows:

  1. The code which decides that a restart is required invokes Restart::Initiate.
  2. Restart::Initiate throws an ElementException.
  3. Thread::Start catches the ElementException and invokes InitThread::InitiateRestart.
  4. InitThread::InitiateRestart interrupts RootThread to tell it that a restart is about to begin and then interrupts itself to initiate the restart.
  5. When InitThread is interrupted, it invokes ModuleRegistry::Restart to manage the restart. This function contains a state machine that steps through the shutdown and startup phases by invoking ModuleRegistry::Shutdown (described below) and ModuleRegistry::Startup (already described).
  6. When RootThread is interrupted, it starts a watchdog timer. When the restart is completed, InitThread interrupts RootThread, which cancels the timer. If the timer expires, RootThread forces InitThread to exit and recreates it. When InitThread is reentered, it invokes ModuleRegistry::Restart again, which escalates the restart to the next level.


This function first allows a subset of threads to run for a while so that they can generate any pending logs. It then notifies all threads of the restart, counting how many of them are willing to exit, and then schedules them until they have exited. Finally, it shuts down all modules in the opposite order that their Startup functions were invoked. As with ModuleRegistry::Startup, code that logs the progress of the restart has been deleted for clarity:

void ModuleRegistry::Shutdown(RestartLevel level)
   // Schedule a subset of the factions so that pending logs will be output.
      for(size_t tries = 120, idle = 0; (tries > 0) && (idle <= 8); --tries)
         if(Thread::SwitchContext() != nullptr)
            idle = 0;

   //  Notify all threads of the restart.
   auto reg = Singleton< ThreadRegistry >::Instance();
   auto before = reg->Threads().Size();
   auto planned = reg->Restarting(level);
   size_t actual = 0;

   //  Schedule threads until the planned number have exited. If some
   //  fail to exit, RootThread will time out and escalate the restart.
      while(actual < planned)
         actual = before - reg->Threads().Size();

   //  Modules must be shut down in reverse order of their initialization.
   for(auto m = modules_.Last(); m != nullptr; modules_.Prev(m))

Shutting Down a Thread

ModuleRegistry::Shutdown invokes Thread::Restarting to see if a thread is willing to exit during the restart. This function, in turn, invokes the virtual function ExitOnRestart:

bool Thread::Restarting(RestartLevel level)
   //  If the thread is willing to exit, signal it. ModuleRegistry.Shutdown
   //  will momentarily schedule it so that it can exit.
      return true;

   //  Unless this is RootThread or InitThread, mark it as a survivor. This
   //  causes various functions to force it to sleep until the restart ends.
   if(faction_ < SystemFaction) priv_->action_ = SleepThread;
   return false;

The default implementation of ExitOnRestart is:

bool Thread::ExitOnRestart(RestartLevel level) const
   //  RootThread and InitThread run during a restart. A thread blocked on
   //  stream input, such as CinThread, cannot be forced to exit because C++
   //  has no mechanism for interrupting it.
   if(faction_ >= SystemFaction) return false;
   if(priv_->blocked_ == BlockedOnConsole) return false;
   return true;

A thread that is willing to exit receives the signal SIGCLOSE. Before it delivers this signal, Thread::Raise invokes the virtual function Unblock on the thread in case it is currently blocked. For example, each instance of UdpIoThread receives UDP packets on an IP port. Because pending user requests are supposed to survive warm restarts, UdpIoThread overrides ExitOnRestart to return false during a warm restart. During other types of restarts, it returns true, and its override of Unblock frees its socket so that its call to recvfrom will immediately return, allowing it to exit.

Traces of the Code in Action

RSC's output directory contains console transcripts (*.console.txt), log files (*.log.txt), and function traces (*.trace.txt) of the following:

  • system initialization, in the files init.*
  • a warm restart, in the files warm* (warm1.* and warm2.* are pre- and post-restart, respectively)
  • a cold restart, in the files cold* (cold1.* and cold2.* are pre- and post-restart, respectively)

The restarts were initiated using the CLI's >restart command.


1 RSC does not currently write-protect any memory, but the goal is to eventually implement it.


  • 23rd December, 2019: Initial version


This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


About the Author

Greg Utas
Canada Canada
Author of Robust Services Core (GitHub) and Robust Communications Software (Wiley, 2005). Formerly Chief Software Architect of the servers (GSM MSCs) that handle the calls in AT&T's wireless network.

Comments and Discussions

GeneralLicense Pin
TheWizardOfOz8-Jan-20 6:52
MemberTheWizardOfOz8-Jan-20 6:52 
AnswerRe: License Pin
Greg Utas9-Jan-20 3:35
professionalGreg Utas9-Jan-20 3:35 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Posted 23 Dec 2019


7 bookmarked