Click here to Skip to main content
14,304,886 members

Robust C++ : Safety Net

Rate this:
5.00 (15 votes)
Please Sign up or sign in to vote.
5.00 (15 votes)
22 Sep 2019GPL3
How to keep a C++ program running after nasty things happen

Introduction

Some programs need to keep running even after nasty things happen, such as using an invalid pointer. Servers, other multi-user systems, and real-time games are a few examples. This article describes how to write robust C++ software that does not exit when the usual behavior is to abort. It also discusses how to capture information that facilitates debugging when nasty things occur in software that has been released to users.

Background

It is assumed that the reader is familiar with C++ exceptions. However, exceptions are not the only thing that robust software needs to deal with. It must also handle POSIX signals, which the operating system raises when something nasty occurs. The header <csignal> defines the following subset of POSIX signals for C/C++:

  • SIGINT: interrupt (usually when Ctrl-C is entered)
  • SIGILL: illegal instruction (perhaps a stack corruption that affected the instruction pointer)
  • SIGFPE: floating point exception (includes dividing by zero)
  • SIGSEGV: segment violation (using a bad pointer)
  • SIGTERM: forced termination (usually when the kill command is entered)
  • SIGBREAK: break (usually when Ctrl-Break is entered)2
  • SIGABRT: abnormal termination (when abort is invoked by the C++ run-time environment)

Similar to how exceptions are caught by a catch statement, signals are caught by a signal handler. Each thread can register a signal handler against each signal that it wants to handle. A signal is simply an int that is passed to the signal handler as an argument.

Using the Code

The code in this article is taken from the Robust Services Core (RSC), and this article is the first time that RSC is being publicized in a significant way. RSC is a large repository that, among other things, provides a framework for developing robust C++ applications. It contains over 200K lines of software organized into static libraries, each in its own namespace. All the code excerpted in this article comes from the namespace NodeBase in the nb directory. NodeBase contains about 48K lines of code that provide base classes for things such as:

  • system initialization/reinitialization
  • configuration parameters
  • multi-threading
  • object pooling
  • CLI commands
  • logging
  • debugging tools

RSC is targeted at Windows but has an abstraction layer that should allow it to be ported to other platforms with modest effort. The Windows targets (in *.win.cpp files) currently comprise 2,624 lines of code.

An application developed using RSC derives from Thread to implement its threads. Everything described in this article then comes for free—unless the application isn't targeted for Windows, in which case that abstraction layer also has to be implemented.

If you don't want to use RSC, you can copy and modify its source code to meet your needs, subject to the terms of its GPL-3.0 license.

Overview of the Classes

RSC contains many details that are not relevant to this article, so the code that we look at will be excerpted from the relevant classes and functions, but with irrelevant details removed. Many of these details are nonetheless important and need to be considered if your approach is to copy and modify RSC software.

We will start by outlining the classes that appear in this article. In most cases, RSC defines each class in a .h of the same name and implements it in a .cpp of the same name. You should therefore be able to easily find the full version of each class in the nb directory.

Thread

Software that wants to be continuously available must catch all exceptions. A single-threaded application could do this in main. But RSC supports multi-threading, so it does this in a base Thread class from which all other threads derive. Thread has a loop that invokes the application in a try clause that is followed by a series of catch clauses which handle any exception not caught by the application.

SysThread

This is a wrapper for a native thread and is created by Thread's constructor. Much of the implementation is platform-specific.

Exception

The direct use of <exception> is inappropriate in a system that needs to debug problems in released software. Consequently, RSC defines a virtual Exception class from which all of its exceptions derive. This class's primary responsibility is to capture the running thread's stack when an exception occurs. In this way, the entire chain of function calls that led to the exception will be available to assist in debugging. This is far more useful than the C string returned by std::exception::what, stating something like "invalid string position", which specifies the problem but not where it arose and maybe not even uniquely where it was detected.

SysThreadStack

SysThreadStack is actually a namespace that wraps a handful of functions. The function of most interest is one that actually captures a thread's stack. Exception's constructor invokes this function, and so does a function (Debug::SwLog) whose purpose is to generate a debug log to record a problem that, although unexpected, did not actually result in an exception. All SysThreadStack functions are platform-specific.

SignalException

When a POSIX signal occurs, RSC throws it in a C++ exception so that it can be handled in the usual way, by unwinding the stack and deleting local objects. SignalException, derived from Exception, is used for this purpose. It simply records the signal that occurred and relies on its base class to capture the stack.

PosixSignal

Each signal supported within RSC must create a PosixSignal instance that includes its name (e.g. "SIGSEGV"), numeric value (11), explanation ("Invalid Memory Reference"), and other attributes. The PosixSignal instances for various signals defined by the POSIX standard, including those in <csignal>, are implemented as private members of the simple class SysSignals. The subset of signals supported on the target platform are then instantiated by SysSignals::CreateNativeSignals.

Throwing a SignalException turns out to be a useful way to recover from serious errors. RSC therefore defines signals for internal use in NbSignals.h. An instance of PosixSignal is also associated with each of these:

//  The following signals are proprietary and are used to throw a
//  SignalException outside the signal handler.
//
constexpr signal_t SIGNIL = 0;        // nil signal (non-error)
constexpr signal_t SIGCLOSE = 120;    // exit thread (non-error)
constexpr signal_t SIGYIELD = 121;    // ran unpreemptably too long
constexpr signal_t SIGTRAPS = 122;    // trapped too many times
constexpr signal_t SIGRETRAP = 123;   // trapped during recovery
constexpr signal_t SIGSTACK1 = 124;   // stack overflow: attempt recovery
constexpr signal_t SIGSTACK2 = 125;   // stack overflow: recreate thread
constexpr signal_t SIGPURGE = 126;    // thread killed or suicided
constexpr signal_t SIGDELETED = 127;  // thread unexpectedly deleted

Walkthroughs

Creating a Thread

Now for the details. Let's start by creating a Thread. A subclass can add its own thread-specific data, but we're interested in Thread's constructor:

Thread::Thread(Faction faction)
{
   //  Thread uses the PIMPL idiom, with much of its data in priv_.
   //
   priv_.reset(new ThreadPriv);

   //  Create a new thread. StackUsageLimit is in words, so convert
   //  it to bytes.
   //
   auto reg = Singleton< ThreadRegistry >::Instance();
   auto prio = FactionToPriority(faction_);
   systhrd_.reset(new SysThread(this, EnterThread, prio,
      ThreadAdmin::StackUsageLimit() << BYTES_PER_WORD_LOG2));
   reg->BindThread(*this);
}

This constructor creates an instance of SysThread, which in turn creates a native thread. The arguments to SysThread's constructor are the thread's attributes:

  • the Thread object being constructed (this)
  • its entry function (EnterThread for all Thread subclasses; it receives this as its argument)
  • its priority (RSC bases this on a thread's Faction, which is not relevant to this article)
  • its stack size, defined by the configuration parameter ThreadAdmin::StackUsageLimit

The new thread is then added to ThreadRegistry, which tracks all active threads.

Here is SysThread's constructor:

SysThread::SysThread(const Thread* client,
   const ThreadEntry entry, Priority prio, size_t size) :
   nthread_(nullptr),
   nid_(NIL_ID),
   sentry_(nullptr),
   signal_(SIGNIL)
{
   //  Create the thread and its sentry. Set the thread's priority.
   //
   nthread_ = Create(entry, client, size, nid_);
   sentry_ = CreateSentry();
   SetPriority(prio);
}

This has invoked three platform-specific functions (see SysThread.win.cpp if you're interested in the details):

  • Create creates the native thread. Its platform-specific handle is saved in nthread_, and its thread number is saved in nid_.
  • CreateSentry creates an object that the thread can wait on and that is signaled when the thread should resume execution (e.g., when the thread wants to sleep until a timeout occurs).
  • SetPriority sets the thread's priority.

Entering a Thread

EnterThread is the entry function for all Thread subclasses.

main_t Thread::EnterThread(void* arg)
{
   auto self = static_cast< Thread* >(arg);

   //  Our argument (self) is a pointer to a Thread. Invoke its entry
   //  function after configuring it catch signals.
   //
   RegisterForSignals();
   return self->Start();
}

RegisterForSignals simply registers SignalHandler against each signal that is native to the underlying platform. This is done by invoking signal (in <csignal>), which must be done by every thread, for each signal that it wants to handle, when the thread is entered and after each time that it receives a signal.

void Thread::RegisterForSignals()
{
   auto& signals = Singleton< PosixSignalRegistry >::Instance()->Signals();

   for(auto s = signals.First(); s != nullptr; signals.Next(s))
   {
      if(s->Attrs().test(PosixSignal::Native))
      {
         signal(s->Value(), SignalHandler);
      }
   }
}

We will look at SignalHandler later. To complete this section, we need to look at Start, which EnterThread invoked.

main_t Thread::Start()
{
   for(priv_->trapped_ = false; true; stats_->traps_->Incr())
   {
      try
      {
         //  Perform any environment-specific initialization (and recovery,
         //  if reentering the thread). Exit the thread if this fails.
         //
         auto rc = systhrd_->Start();
         if(rc != 0) return Exit(rc);

         if(priv_->trapped_)
         {
            //  The thread just trapped. The full version of this code can
            //  do various things. But most of the time it invokes Recover,
            //  a virtual function that a subclass can implement if it needs
            //  to clean up unfinished work before it resumes execution.
            //
            priv_->recovering_ = true;
            Recover();
            priv_->recovering_ = false;
         }

         //  Invoke the thread's entry function. If this returns,
         //  the thread exited voluntarily.
         //
         Enter();
         return Exit(SIGNIL);
      }

      //  Catch all exceptions. TrapHandler returns one of
      //  o Continue, to resume execution at the top of this loop
      //  o Release, to exit the thread after deleting it
      //  o Return, to exit the thread immediately
      //  o Rethrow, to rethrow the exception
      //
      catch(SignalException& sex)
      {
         switch(TrapHandler(&sex, &sex, sex.GetSignal(), sex.Stack()))
         {
         case Continue: continue;
         case Release:  return Exit(sex.GetSignal());
         case Return:   return sex.GetSignal();
         default:       throw;
         }
      }
      catch(Exception& ex)
      {
         switch(TrapHandler(&ex, &ex, SIGNIL, ex.Stack()))
         {
         case Continue: continue;
         case Release:  return Exit(SIGNIL);
         case Return:   return SIGDELETED;
         default:       throw;
         }
      }
      catch(std::exception& e)
      {
         switch(TrapHandler(nullptr, &e, SIGNIL, nullptr))
         {
         case Continue: continue;
         case Release:  return Exit(SIGNIL);
         case Return:   return SIGDELETED;
         default:       throw;
         }
      }
      catch(...)
      {
         switch(TrapHandler(nullptr, nullptr, SIGNIL, nullptr))
         {
         case Continue: continue;
         case Release:  return Exit(SIGNIL);
         case Return:   return SIGDELETED;
         default:       throw;
         }
      }
   }
}

Each time through its loop, Start began by invoking SysThread::Start, which allows the native thread to perform any work that is required before it can safely run. This is platform-specific code which looks like this on Windows:

signal_t SysThread::Start()
{
   //  This is also invoked when recovering from a trap, so see if a stack
   //  overflow occurred. Some of these are irrecoverable, in which case
   //  returning SIGSTACK2 causes the thread to exit and be recreated.
   //
   if(status_.test(StackOverflowed))
   {
      if(_resetstkoflw() == 0)
      {
         return SIGSTACK2;
      }

      status_.reset(StackOverflowed);
   }

   //  The translator for Windows structured exceptions must be installed
   //  on a per-thread basis.
   //
   _set_se_translator((_se_translator_function) SE_Handler);
   return 0;
}

The first part of this deals with thread stack overflows, which can be particularly nasty. The last part installs a Windows-specific handler. Windows doesn't normally raise POSIX signals, but instead has what it calls "structured exceptions". We therefore provide SE_Handler, which translates a Windows-specific exception into a POSIX signal that can be thrown using our SignalException. The code for this will appear later.

Exiting a Thread

Exit is normally invoked to exit a thread. It is only bypassed if a Thread somehow gets deleted while it is still running. In that case, TrapHandler returns Return, which causes the thread to exit immediately, given that it no longer has any objects to delete.

main_t Thread::Exit(signal_t sig)
{
   auto reg = Singleton< PosixSignalRegistry >::Instance();

   //  Recreate the thread if it did not exit voluntarily and did not
   //  receive a final signal. This is done by deleting its native
   //  thread wrapper and waking up InitThread, which is responsible
   //  for creating a new native thread, after which the thread can
   //  be entered as if for the first time.
   //
   if((sig != SIGNIL) && !reg->Attrs(sig).test(PosixSignal::Final))
   {
      systhrd_.reset();
      Singleton< InitThread >::Instance()->Interrupt();
   }
   else
   {
      delete this;
   }

   return sig;
}

Receiving a Windows Structured Exception

As previously mentioned, we register SE_Handler to map each Windows exception to a POSIX signal:

//  Converts a Windows structured exception to a POSIX signal. The type of EX
//  is actually EXCEPTION_POINTERS, but it is not used and is therefore omitted.
//
void SE_Handler(uint32_t errval, void* ex)
{
   signal_t sig = 0;

   switch(errval)                         // errval:
   {
   case DBG_CONTROL_C:                    // 0x40010005
      sig = SIGINT;
      break;

   case DBG_CONTROL_BREAK:                // 0x40010008
      sig = SIGBREAK;
      break;

   case STATUS_DATATYPE_MISALIGNMENT:     // 0x80000002
   case STATUS_ACCESS_VIOLATION:          // 0xC0000005
   case STATUS_IN_PAGE_ERROR:             // 0xC0000006
   case STATUS_INVALID_HANDLE:            // 0xC0000008
   case STATUS_NO_MEMORY:                 // 0xC0000017
      sig = SIGSEGV;
      break;

   case STATUS_ILLEGAL_INSTRUCTION:       // 0xC000001D
      sig = SIGILL;
      break;

   case STATUS_NONCONTINUABLE_EXCEPTION:  // 0xC0000025
      sig = SIGTERM;
      break;

   case STATUS_INVALID_DISPOSITION:       // 0xC0000026
   case STATUS_ARRAY_BOUNDS_EXCEEDED:     // 0xC000008C
      sig = SIGSEGV;
      break;

   case STATUS_FLOAT_DENORMAL_OPERAND:    // 0xC000008D
   case STATUS_FLOAT_DIVIDE_BY_ZERO:      // 0xC000008E
   case STATUS_FLOAT_INEXACT_RESULT:      // 0xC000008F
   case STATUS_FLOAT_INVALID_OPERATION:   // 0xC0000090
   case STATUS_FLOAT_OVERFLOW:            // 0xC0000091
   case STATUS_FLOAT_STACK_CHECK:         // 0xC0000092
   case STATUS_FLOAT_UNDERFLOW:           // 0xC0000093
   case STATUS_INTEGER_DIVIDE_BY_ZERO:    // 0xC0000094
   case STATUS_INTEGER_OVERFLOW:          // 0xC0000095
      sig = SIGFPE;
      break;

   case STATUS_PRIVILEGED_INSTRUCTION:    // 0xC0000096
      sig = SIGILL;
      break;

   case STATUS_STACK_OVERFLOW:            // 0xC00000FD
      //
      //  A stack overflow in Windows now raises the exception
      //  System.StackOverflowException, which cannot be caught.
      //  Stack checking in Thread should therefore be enabled.
      //
      sig = SIGSTACK1;
      break;

   default:
      sig = SIGTERM;
   }

   //  Handle SIG. This usually throws an exception; in any case, it will
   //  not return here. If it does return, there is no specific provision
   //  for reraising a structured exception, so simply return and assume
   //  that Windows will handle it, probably brutally.
   //
   Thread::HandleSignal(sig, errval);
}

Receiving a POSIX Signal

We registered SignalHandler to receive POSIX signals. Even on Windows, with its structured exceptions, this code is reached after invoking raise (in <csignal>):

void Thread::SignalHandler(signal_t sig)
{
   //  Re-register for signals before handling the signal.
   //
   RegisterForSignals();
   if(HandleSignal(sig, 0)) return;

   //  Either trap recovery is off or we received a signal that could not be
   //  associated with a thread. Restore the default handler for the signal
   //  and reraise it (to enter the debugger, for example).
   //
   signal(sig, SIG_DFL);
   raise(sig);
}

Converting a POSIX Signal to a SignalException

Now that we have a POSIX signal which was either received by SignalHandler or translated from a Windows structured exception by SE_Handler, we can turn it into a SignalException:

bool Thread::HandleSignal(signal_t sig, uint32_t code)
{
   Debug::ft(Thread_HandleSignal);

   auto thr = RunningThread(false);

   if(thr != nullptr)
   {
      //  Turn the signal into a standard C++ exception so that it can
      //  be caught and recovery action initiated.
      //
      throw SignalException(sig, code);
   }

   //  The running thread could not be identified. A break signal (e.g.
   //  on ctrl-C) is sometimes delivered on an unregistered thread. If
   //  the RTC timeout is not being enforced and the locked thread has
   //  run too long, trap it; otherwise, assume that the purpose of the
   //  ctrl-C is to trap the CLI thread so that it will abort its work.
   //
   auto reg = Singleton< PosixSignalRegistry >::Instance();

   if(reg->Attrs(sig).test(PosixSignal::Break))
   {
      if(!ThreadAdmin::TrapOnRtcTimeout())
      {
         thr = LockedThread();

         if((thr != nullptr) && (Clock::TicksUntil(thr->priv_->currEnd_) > 0))
         {
            thr = nullptr;
         }
      }

      if(thr == nullptr) thr = Singleton< CliThread >::Extant();
      if(thr == nullptr) return false;
      thr->Raise(sig);
      return true;
   }

   return false;
}

The code after the throw requires some explanation. Break signals (SIGINT, SIGBREAK), which are generated when the user enters Ctrl-C or Ctrl-Break, often arrive on an unknown thread. It is reasonable to assume that the user wants to abort work that is taking too long or, worse, stuck in an infinite loop.

But what work should be aborted? Here, it must be pointed out that RSC strongly encourages the use of cooperative scheduling, where a thread runs unpreemptably and yields after completing a logical unit of work. Thread implements this by acquiring a global lock before allowing a thread to resume execution. A timeout on this lock is enforced. If a thread does not yield before the timeout, it receives the internal signal SIGYIELD, causing a SignalException to be thrown. During development, it is sometimes useful to disable this timeout. So in trying to identify which thread is performing the work that the user wants to abort, the first candidate is the thread that currently owns the global run-to-completion lock. However, this thread will only be interrupted if the use of SIGYIELD has been disabled and the thread has already run for longer than the timeout.

If interrupting the locked thread doesn't seem appropriate, the assumption is that CliThread should be interrupted. This thread is the one that parses and executes user commands entered through the console. So unless CliThread doesn't exist for some obscure reason, it will receive the SIGYIELD.

If a thread to interrupt has now been identified, Thread::Raise is invoked to deliver the signal to that thread.

Signaling Another Thread

Sending a signal to another thread is problematic. The raise function in <csignal> only signals the running thread. Nor does Windows appear to expose any function that could be used for the purpose. So what to do?

In RSC, the first thing that most functions do is call Debug::ft to identify the function that is now executing. These calls were removed from the code in this article, but now it is necessary to mention them. The original (and still extant) purpose of Debug::ft is to support a function trace tool, which is why most non-trivial functions invoke it. What this trace tool produces will be seen later. The pervasiveness of Debug::ft also allows it to be co-opted for other purposes. Because a thread is likely to invoke it frequently, it can check if the thread has a signal waiting. If so, boom! It can also check if the thread is at risk of overrunning its stack, in which case boom! (This is better than allowing an overrun to occur. As noted in SE_Handler, Windows no longer even allows a stack overflow exception to be intercepted.)

Here is the code that delivers a signal to another thread:

void Thread::Raise(signal_t sig)
{
   Debug::ft(Thread_Raise);

   auto reg = Singleton< PosixSignalRegistry >::Instance();
   auto ps1 = reg->Find(sig);

   //  If this is the running thread, throw the signal immediately. If the
   //  running thread can't be found, don't assert: the signal handler can
   //  invoke this when a signal occurs on an unknown thread.
   //
   auto thr = RunningThread(false);

   if(thr == this)
   {
      throw SignalException(sig, 0);
   }

   //  If the signal will force the thread to exit, try to unblock it.
   //  Unblocking usually involves deallocating resources, so force the
   //  thread to sleep if it wakes up during Unblock().
   //
   if(ps1->Attrs().test(PosixSignal::Exit))
   {
      if(priv_->action_ == RunThread)
      {
         priv_->action_ = SleepThread;
         Unblock();
         priv_->action_ = ExitThread;
      }
   }

   SetSignal(sig);
   if(!ps1->Attrs().test(PosixSignal::Delayed)) SetTrap(true);
   if(ps1->Attrs().test(PosixSignal::Interrupt)) Interrupt();
}

Given that the target thread can throw a SignalException for itself, via a check supported by Debug::ft, Raise does the following:

  • invokes SetSignal to record the signal against the thread
  • invokes Unblock (a virtual function) to unblock the thread if the signal will force it to exit
  • invokes SetTrap if the signal should be delivered as soon as possible instead of waiting until the next time the thread yields (this sets the flag that is checked via Debug::ft)
  • invokes Interrupt to wake up the thread if the signal should be delivered now instead of waiting until the thread resumes execution

In the above list, whether to invoke each of the last three functions is determined by various attributes that can be set in the signal's instance of PosixSignal.

Capturing a Thread's Stack When an Exception Occurs

SignalException derives from Exception (which derives from std::exception). Although Exception is a virtual class, all RSC exceptions derive from it because its constructor captures the running thread's stack by invoking SysThreadStack::Display:

Exception::Exception(bool stack, fn_depth depth) : stack_(nullptr)
{
   //  When capturing the stack, exclude this constructor and those of
   //  our subclasses.
   //
   if(stack)
   {
      stack_.reset(new std::ostringstream);
      if(stack_ == nullptr) return;
      *stack_ << std::boolalpha << std::nouppercase;
      SysThreadStack::Display(*stack_, depth + 1);
   }
}

SignalException simply records the signal and a debug code after telling Exception to capture the stack:

SignalException::SignalException(signal_t sig, debug32_t errval) :
   Exception(true, 1),
   signal_(sig),
   errval_(errval)
{
}

Capturing a thread stack is platform-specific. See SysThreadStack.win.cpp for the Windows targets. Here is an example of its output within an RSC log for a Windows structured exception that got mapped to SIGSEGV. The stack trace is the portion after "Function Traceback":

THR902 2-Aug-2019 09:45:43.183 on Reigi {5}
in NodeTools.RecoveryTestThread (tid=15, nid=0x0000bc60): trap number 2
type=Signal
signal : 11 (SIGSEGV: Invalid Memory Reference)
errval : 0xc0000005
Function Traceback:
  NodeBase.SignalException.SignalException @ signalexception.cpp + 40[12]
  NodeBase.Thread.HandleSignal @ thread.cpp + 1801[16]
  NodeBase.SE_Handler @ systhread.win.cpp + 126[13]
  _unDNameEx @ <unknown file> (err=487)
  is_exception_typeof @ <unknown file> (err=487)
  is_exception_typeof @ <unknown file> (err=487)
  is_exception_typeof @ <unknown file> (err=487)
  _CxxFrameHandler3 @ <unknown file> (err=487)
  RtlInterlockedCompareExchange64 @ <unknown file> (err=487)
  RtlInterlockedCompareExchange64 @ <unknown file> (err=487)
  KiUserExceptionDispatcher @ <unknown file> (err=487)
  NodeTools.RecoveryTestThread.Enter @ ntincrement.cpp + 3584[0]
  NodeBase.Thread.Start @ thread.cpp + 2799[15]
  NodeBase.Thread.EnterThread @ thread.cpp + 1564[0]
  BaseThreadInitThunk @ <unknown file> (err=487)
  RtlGetAppContainerNamedObjectPath @ <unknown file> (err=487)
  RtlGetAppContainerNamedObjectPath @ <unknown file> (err=487)

In released software, users can collect these logs and send them to you. Better still, your software can include code to automatically send them to you over the internet. Each of these logs highlights a bug that needs to be fixed.

Recovering from an Exception

The above log was produced by TrapHandler, which was mentioned a long time ago as the function that Thread::Start invokes when it catches an exception:

Thread::TrapAction Thread::TrapHandler(const Exception* ex,
   const std::exception* e, signal_t sig, const std::ostringstream* stack)
{
   //  If this thread object was deleted, exit immediately.
   //
   if(sig == SIGDELETED)
   {
      return Return;
   }

   //  Record the signal against the thread. Record a stack overflow
   //  against the native thread wrapper for use by SysThread::Start.
   //
   SetSignal(sig);

   if((sig == SIGSTACK1) && (systhrd_ != nullptr))
   {
      systhrd_->status_.set(SysThread::StackOverflowed);
   }

   //  Generate a log if the signal occurred as the result of
   //  a software error.
   //
   auto reg = Singleton< PosixSignalRegistry >::Instance();

   if(!reg->Attrs(sig).test(PosixSignal::NoError))
   {
      auto log = Log::Create(ThreadLogGroup, ThreadException);

      if(log != nullptr)
      {
         auto trapcount = ThreadAdmin::TrapCount();
         *log << Log::Tab << "in " << to_str();
         *log << ": trap number " << trapcount << CRLF;

         if(e != nullptr)
         {
            *log << Log::Tab << "type=" << e->what() << CRLF;
            if(ex != nullptr) ex->Display(*log, spaces(4));
         }
         else
         {
            if(sig != SIGNIL)
            {
               *log << Log::Tab << "signal=" << reg->strSignal(sig) << CRLF;
            }
            else
            {
               *log << Log::Tab << UnknownExceptionStr << CRLF;
               if(Element::RunningInLab()) return Rethrow;
            }
         }

         if(stack != nullptr) *log << stack->str();

         //  Log a thread's data if it will be forced to exit.
         //
         if(reg->Attrs(priv_->signal_).test(PosixSignal::Exit))
         {
            *log << Log::Tab << ThreadDataStr << CRLF;
            Display(*log, Log::Tab + spaces(2), NoFlags);
         }

         Log::Submit(log);
      }
   }

   //  If this is a final signal, force the thread to exit.
   //
   if(reg->Attrs(sig).test(PosixSignal::Final))
   {
      return Release;
   }

   //  Resume execution at the top of Start.
   //
   return Continue;
}

Traces of the Code in Action

RSC has some 20 tests that focus on exercising this software. Each of them does something nasty to see if the software can handle it without exiting. During these tests, the function trace tool is enabled so that Debug::ft will record all function calls. For the SIGSEGV test, which is associated with the log shown above, the output of the trace tool looks like this. When the tool is on, code slows down by a factor of about 5x. When the tool is off, calls to Debug::ft incur very little overhead.

Points of Interest

It is only forthright to mention that the C++ standard does not support throwing an exception in response to a POSIX signal. In fact, it is undefined behavior for a signal handler to do almost anything in a C++ environment! A list of undefined behaviors appears here; those pertaining to signal handling are numbered 128 through 135. The detailed coding standard available on the same website makes these recommendations about signals:

  • SIG31-C. Do not access shared objects in signal handlers
  • SIG34-C. Do not call signal() from within interruptible signal handlers
  • SIG35-C. Do not return from a computational exception signal handler

Fortunately, much of this is theoretical rather than practical. The main reason that most things related to signal handling are undefined behavior is because different platforms support signals in different ways. Many of the risks that lead to undefined behavior result from race conditions that will rarely occur1. Regardless, what can you do if your software has to be robust? It's far better to risk undefined behavior than to let your program exit.

The same rationale, of not being able to depend on how the underlying platform does something, does not excuse the standard's adoption of noexcept. If it were possible to throw an exception in reponse to a signal, any noexcept function would be unable to do so. Even a non-virtual "getter" that simply returns a member's value is now at risk. If such a function is invoked with a bad this pointer, it will add an offset to that pointer and try to read memory. Boom! An ostensibly trivial noexcept function, through no fault of its own, has now caused the invocation of abort when the signal handler throws an exception to recover from the SIGSEGV.

The invocation of abort isn't the end of the world, let alone your program, because your signal handler can turn the SIGABRT into an exception. But now what are we dealing with, abort or an exception? What if the exception isn't "allowed", either because it occurred in a destructor or noexcept function? (Hands up, those of you who have never seen anything nasty happen in a destructor.)

When abort is invoked, the C++ standard says it is implementation dependent whether the stack is unwound in the same way as when an exception is thrown. That is, local objects may not get deleted. So if a function on the stack owns something in a unique_ptr local, it will leak. And if it has wrapped a mutex in a local object whose destructor releases the mutex whenever the function returns, the outcome could be far worse. This is assuming, of course, that your program will be allowed to survive. If it won't, it doesn't really matter.

Unless your software is shockingly infallible, it will occasionally cause an abort, and your C++ compiler better allow this to turn into an exception that unwinds the stack in all circumstances. In the end, both your platform and compiler will make it either possible or virtually impossible to deliver robust C++ software.

To summarize, here are some things that the C++ standard should mandate to get serious about robustness:

  • A signal handler must be able to throw an exception when it receives a signal.
  • The stack must be unwound if the signal handler throws an exception in reponse to a SIGABRT.
  • std::exception's constructor must provide a way to capture debug information, such as a thread's stack, before the stack is unwound.

The good news is that platform and compiler vendors often make it possible to deliver robust software, despite what the standard fails to mandate.

Notes

1 In UNIX-like environments, signals other than those discussed in this article have sometimes been used as a primitive form of inter-thread communication. This greatly increases the risk of these race conditions and is not recommended here.

2 Oops. Although part of POSIX, SIGBREAK is defined in Windows but not in <csignal>.

History

  • 3rd September, 2019: There have been tweaks since the original was published, but nothing significant
  • 28th August, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

Share

About the Author

Greg Utas
Architect
Canada Canada
Author of Robust Services Core (GitHub) and Robust Communications Software (Wiley, 2005). Formerly Chief Software Architect of the servers (GSM MSCs) that handle the calls in AT&T's wireless network.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Andreas Saurwein Franci Gonçalves5-Sep-19 4:28
memberAndreas Saurwein Franci Gonçalves5-Sep-19 4:28 
QuestionQuestioning some of your advice (especially re: invalid pointers) Pin
Mike Diack4-Sep-19 2:27
memberMike Diack4-Sep-19 2:27 
AnswerRe: Questioning some of your advice (especially re: invalid pointers) Pin
Greg Utas4-Sep-19 3:10
memberGreg Utas4-Sep-19 3:10 
GeneralRe: Questioning some of your advice (especially re: invalid pointers) Pin
Mike Diack5-Sep-19 1:21
memberMike Diack5-Sep-19 1:21 
GeneralRe: Questioning some of your advice (especially re: invalid pointers) Pin
Greg Utas5-Sep-19 7:55
memberGreg Utas5-Sep-19 7:55 
QuestionSignalException& sex Pin
Grober_31-Aug-19 1:02
memberGrober_31-Aug-19 1:02 
AnswerRe: SignalException& sex Pin
Greg Utas31-Aug-19 2:28
memberGreg Utas31-Aug-19 2:28 
QuestionGreat article! Pin
davercadman30-Aug-19 5:40
memberdavercadman30-Aug-19 5:40 
AnswerRe: Great article! Pin
Greg Utas30-Aug-19 6:57
memberGreg Utas30-Aug-19 6:57 
GeneralRe: Great article! Pin
davercadman30-Aug-19 7:48
memberdavercadman30-Aug-19 7:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 28 Aug 2019

Stats

16.7K views
614 downloads
30 bookmarked