Download source code - 1.76 KB

Introduction

This article shows a way to design a logger that does what the description of the article suggests:

Work as fast as possible when nothing happens.
Log as much info as possible when necessary.

It's the question of the design, not the implementation. I'd like to point this out explicitly. The implementation provided with the article is pretty simple (although it contains a lot of C++ voodoo), it's just about 200 lines of code. Also, this implementation should not be taken as-is in serious projects, it should be worked out and adapted for the needs. For instance, you may add buffering, automatic redirection into a new file when the file becomes too large, auto-deletion of old files, and etc.

But the design of this log is very valuable in my opinion. I use a similar type of logging in all my applications, and it has proved to be great.

Logging

Logging is a surprisingly important part of the program. And, the more complex and big the program becomes - the more important to have proper logging. In many situations (such as debugging and crash investigations), logs are almost the only pieces of information you have. It's very hard to imagine servers designed to run 24/7 without logging.

There're many different logging systems today. There're simple wrapper classes to write into files (that everyone can create in minutes without any problem), there're also more serious loggers with options for buffering, formatting, filtering, output selection, and etc. There's also a library called "Apache log" (a.k.a. log4cxx), which is a super-mega-bombastic logging library loaded with tons of functionality.

Regardless of the library chosen, logging has the following unavoidable drawbacks:

Performance hit

Logging is heavy. When logging in some critical code blocks (such as inside inner loops), the logging may kill the performance completely. Many logging libraries do caching (before actually sending the output to file/console/etc.), which reduces the load significantly. However:

Still they're heavy. Even without writing the information, the formatting of the output string is a complex operation. Plus, in a multi-threaded environment, appending things to the cache requires synchronization (i.e., interlocked operations), which also has a significant impact on the performance.
Caching may lead to many bad surprises. For instance, when a program crashes, all the cached information is lost.

Bad information/junk ratio

Usually, it's hard to predict where the problem would arise and what will be useful for problem investigation. Logging too much leads to performance degradation, plus the logs become loaded with tons of useless crap. On the other side, disabling logging in some places is dangerous - you might remove something that would be very valuable if something bad happens.

Many logging libraries support filtering. That is, you may put a lot of logging statements in the code; however, they may not be actually logged during runtime. Some libraries require recompilation for this to work, others need program restart, and some can even be reconfigured at run-time. In such a way, you may usually work at very low log levels (filter out almost everything), and during debugging, put the log level higher. Some libraries even allow different filtering options for different libraries/classes.

So far filtering is good. But still not good enough. Actually, this means that if the crash happens for the first time (when the log level is low) - you have almost no information. Then you start debugging: raise the log level and try to reproduce the problem. But some problems are very hard to reproduce, especially those related to timings, threading, etc. Also, after you raise the log level, the program behaves differently - due to the performance impact of the logging, timings may change. Actually, you may not be able to reproduce the problem at all when the logs are on.

How to kill two hares in a single shot

Now, let's look at another log design. In most loggers, during runtime, you use one type of functionality: some function/macro/operator that is supposed to do the logging. In our design, we have three operations: logging, putting the so-called Checkpoints, and dumping all the checkpoints. Below is the syntax of these operations:

// Actual logging.
PUTLOG("This is logged immediately. SomeStr=%s, SomeNum=%d" 
       << szSomeStr << nSomeNum);

// Declaring a checkpoint
CHECKPOINT("This is a checkpoint. SomeNum=%d" << nSomeNum)

// This sends all the checkpoints into the log
Log::PrintCheckpoints();

Logging and checkpoint statements support printf-like format strings with variable number of arguments. The only difference is that parameters are separated by << instead of comma.

Checkpoint is a temporary object that holds an expression that is ready to be logged but not logged yet. It lives inside the scope of the declaration (curly braces). If nothing bad happens - it is silently removed when the scope is exited, and nothing is logged. However, if something bad happens - you call Log::PrintCheckpoints and all the checkpoints are appended to the log at once.

To make things clear, let's look at the following example:

{
    CHECKPOINT(_T("One"))
    {
        CHECKPOINT(_T("Two"))
        {
            CHECKPOINT(_T("Three"))
            Log::PrintCheckpoints();
        }
        Log::PrintCheckpoints();
    }
    Log::PrintCheckpoints();
}

When Log::PrintCheckpoints is called for the first time (in the most inner scope) - it will see all three checkpoints. When it's called the next time - the most inner checkpoint will be already deleted and it'll only see "One" and "Two". And, when Log::PrintCheckpoints is called for the third time - only the first checkpoint will be printed.

In the example above, all the checkpoints reside in one function, but this is not mandatory. Checkpoints may be declared in different functions, still they are all visible. Also, if a function is called recursively - its checkpoints will be seen recursively as well. In other words, checkpoints are visible until their lifetime expires.

Let's look at another example, which is a more-or-less realistic demonstration of our logging:

void SomeWorkerFunc()
{
    CHECKPOINT(_T("WorkerThread=%d") << GetCurrentThreadId())

    // ...
    // eventually process some I/O for client
    pClient->ProcessIo();
    // ...
}

void Client::ProcessIo()
{
    CHECKPOINT(_T("Request from client ID=%u") << m_ID)
    
    // ...
    // eventually decide to send something to the client
    CHECKPOINT(_T("Sending file path=%s") << szSomeFilePath)

    HANDLE hFile = CreateFile(szSomeFilePath, /* ... */);
    ReadFile(/* ... */);
    // ...
}

void ReadFile(/* ... */)
{
    CHECKPOINT(_T("Attempt to read %u bytes", dwBytes)
    TestSysCall(ReadFile(hFile, pBuf, dwBytes, /* ... */));
}

void TestSysCall(BOOL bVal)
{
    if (!bVal)
    {
        PUTLOG(_T("Error=%u") << GetLastError());
        Log::PrintCheckpoints();
        
        throw SomeException;
    }
}

In this example, if nothing bad happens, nothing will be logged. However, if ReadFile fails, the following will be emitted to the log:

Error=187
Attempt to read 2048 bytes
Sending file path path=C:\Data\SomeFile.bin
Request from client ID=22487
WorkerThread=3047

Note: The only function here that reports an error is TestSysCall. It doesn't have any information about the context in which it occurred, it doesn't know what originated the error. It only has (and logs) the error code.

Is it possible to implement such a behavior with traditional logging (without checkpoints)? Theoretically - yes. But, this demands an enormous number of logging code, looks awkward, and makes the program totally unreadable. The code would look like this:

// ...
HANDLE hFile = CreateFile(szSomeFilePath, /* ... */);
if (INVALID_HANDLE_VALUE == hFile)
{
    Log(_T("Couldn't open file=%s when attempted to send" 
           " it to the cleint ID=%u, in thread=%s. Error=%u")
        << szSomeFilePath << m_ID 
        << GetCurrentThreadId() << GetLastError());
    throw SomeExc;
}

// ...
if (!ReadFile(hFile, /* ... */))
{
    Log(_T("Couldn't read %u bytes from file=%s when attempted" 
           " to send it to the cleint ID=%u, in thread=%s. Error=%u")
        << szSomeFilePath << dwBytes << m_ID 
        << GetCurrentThreadId() << GetLastError());
    throw SomeExc;
}

// ...

That is, in this case, you have to pass all the context information to all the functions and print them in all the log statements. You'll not be able to use uniform (context-free) functions for common operations such as I/O, error checking, and etc.

Indeed, no one really does this. Instead, people either don't log most of the context information, or log them before there's a real need for this (the drawbacks of this we've already discussed).

Things change when you use checkpoints. Now, you shouldn't hesitate to provide as much information as you have, which will be actually logged only when needed.

Implementation details

PUTLOG

Let's start with the PUTLOG macro. It takes an expression as a parameter, and immediately logs this expression.

The first thing in the expression must be a formatting string. Depending on the multi-byte/Unicode compiler option, the string should be either multi-byte or Unicode. It's worth to use the _T() macro to automatically use the correct string.
Consequent parameters must by specified according to the contents of the formatting string. If no parameters are needed - none should be specified.
Parameters must be separated by the << operator (as you might already have noticed).
The expression is evaluated exactly once. This is important if the evaluation of the expression has side effects (for example: it may contain function calls that have some side effects).

As I've already said at the beginning, the implementation of logging is very raw and should not be used as-is. It doesn't support filtering. That is, the expression is always evaluated and logged. There's also no possibility to choose which log to write to (in case you might want to have several log outputs for different things). You might notice that so far we didn't see any initialization method for logging, and no place where we can specify which file to log to. This is because this implementation doesn't write anything to files! It just sends the formatted string into the debug output.

You should treat this implementation only as a skeleton. If you like the idea - go ahead, take it and adapt it for your needs. For instance, if you want to add support for multiple logger objects, you should probably redefine the PUTLOG macro to accept one more parameter - the logger object. In addition, you may support filtering: either run-time filter options (stored in the specified logger object) or compile-time definition of the logging macro. However, you must be careful with this: when not logging, you have an option to either evaluate or not the given expression. Of course, it's better not to evaluate it (performance gain); however, be careful about the side effects.

CHECKPOINT

Now comes the main course: the CHECKPOINT macro. When deciding about how to design and implement checkpoints, there is some degree of freedom. The key goals of my implementation are:

It must be as light and as fast as possible.
Convenient usage.

That is, if it won't be convenient, you'll not want to use it. And, if it will be heavy and slow - you may want, but should not use it.

In order to understand what exactly CHECKPOINT does, we first need to realize which steps are performed during traditional logging. Let's say we log the following:

PUTLOG(_T("My name is %s, Elapsed time=%d") 
          << GetMyName() << TimeNow() - nTimeStart);

In this example, the following steps are performed:

The logging expression is evaluated. GetName and TimeNow are called. The value of time minus nTimeStart is calculated.
The logging string is formatted. This means some buffer is allocated, and a variant of a printf-like function is called to build the final string.
Optionally, some decorations may be done to the final string (timestamp at the beginning, line end at the ending).
The string is sent to the output (file, console, etc.)

The CHECKPOINT macro takes the logging expression as a parameter. This expression is immediately evaluated, but the final logging string is not formatted and nothing is emitted. This means, only the first step of the above chain is performed.

CHECKPOINT(_T("My name is %s, Elapsed time=%d") 
              << GetMyName() << TimeNow() - nTimeStart);

In this example, GetName and TimeNow are immediately called, and the value of time minus nTimeStart is calculated. This is all. All the remaining steps are deferred until the checkpoint will be actually logged. This has the following impacts:

Great performance. If the log expression doesn't contain heavy things (such as function calls) - the evaluation of the expression is very fast.
Since the formatting of the actual log string is deferred - all the parameters in the expression must remain valid.

What does the last sentence mean? Simply speaking - the return value of GetMyName and TimeNow must remain valid until the lifetime of the checkpoint expires. For instance, GetMyName may return a pointer to some string, then this string is eventually deleted/modified whereas the checkpoint is still alive. This must be avoided.

Another variant: GetMyName instead of a raw pointer to the string returns a C++ string object by value (either CString, std::string, or another creature). The returned object is temporary, its lifetime is limited to the statement. There's no problem to using such a thing in PUTLOG since the logging string is formatted immediately, but using this in CHECKPOINT will have catastrophic consequences.

Is it possible to implement CHECKPOINT without those drawbacks? Yes, but you'll have to pay for this. One of the variants is to immediately format the log string. Another way - define CHECKPOINT so that it'll immediately make a copy of all the specified strings.

Is it worth to pay the price? Absolutely not, too expensive. The current implementation uses just a few CPU cycles during initialization, and a few cycles at cleanup, which is negligible in 99% of the cases I can imagine, even in the most critical code blocks. And, duplicating strings on the other hand, which involves heap operations, will take hundreds to thousands of cycles. This will make using checkpoints worthless. After all, in most of the scenarios, this consideration is irrelevant. Mostly checkpoints are used (at least by me) either without parameters at all, or with just a few primitive ones. And, if it happens that you really want to use some string which might expire - be so kind to duplicate them explicitly.

In-depth technical details

The log expression is parsed (by both PUTLOG and CHECKPOINT) and converted into a Record structure, which is defined by the following:

template <size_t nSize>
struct Record {
    PCTSTR m_szFmt;
    int m_pParams[nSize / sizeof(int)];
};

m_szFmt is the format string, and m_pParams - all the needed parameters to format the final log string, packed in a manner recognized by the printf-like functions.

Note: Record is a template structure, its template parameter is the size of the packed parameters needed for formatting. The actual formatting parameters are put there at run-time, but the size required to hold them is determined at compile-time. How is this achieved? Well, here comes the fierce C++ templates voodoo. I let you, dear reader, to puzzle this out from the code. If I tell you now how this compiles and works - I may spoil the most remarkable joy that you can get from the article.

The PUTLOG macro builds such a record and immediately logs it. And, the CHECKPOINT macro does the following:

Builds the record.
Declares a temporary object of a Checkpoint class, which wraps the above record. This object "registers" itself in the c'tor (constructor) and "unregisters" itself in the d'tor (destructor).

What does "register" and "unregister" mean? Let's look at the code. Checkpoint is defined this way:

struct Checkpoint {

    __declspec(thread) static Checkpoint* s_pTop;
    Checkpoint* m_pPrev;
    // ...

    template <size_t>
    Checkpoint(/* ... */); // register
    ~Checkpoint(); // unregister
};

s_pTop points to the most recently "registered" checkpoint, and the m_pPrev member of every checkpoint points to the one that it replaced. In such a way, checkpoints form a sort of a stack.

Note:The s_pTop variable is declared with __declspec(thread) semantics. This means that it's accessed via TLS. (Those who are not familiar: TLS - is "thread local storage". It's like a global variable; however, it's unique for every thread.) Accessing a variable via TLS is very fast. From the performance point of view, this is comparable with accessing a regular variable.

__declspec(thread) can't be used if part of your code resides in a DLL. If this is the case - you have to access the TLS explicitly (by TlsGetValue/TlsSetValue). You'll also have to ensure that you use the same TLS index in the EXE and all the DLLs; otherwise, you'll receive several checkpoint chains, whereas only one will be visible in every module.

Exception handling and checkpoints

In the examples above, there was an explicit call to Log::PrintCheckpoints when the program detects a problem. In particular, in one of the cases, we called it exactly before throwing an exception. But, we can't guarantee this function will be called that before every thrown exception. Sometimes, we'll have to call some third-party code (such as STL, MFC, boost, etc.) which is not aware of our logging system, and it will just throw an exception. In addition, an exception may be raised implicitly by the OS/hardware (such as access violation). By the way, in this case, it's most important to dump all the information, but unfortunately, we don't have the opportunity to dump the checkpoints.

After the exception is thrown and caught - dumping checkpoints is impossible. Because they don't exist anymore. Recall that checkpoints exist until their lifetime expires, and after the exception is caught, the stack is already unwound and all the automatic (on-stack) variables are destroyed.

Fortunately, there's a solution. There's a possibility to do something just after the exception is thrown, but before it's caught and the stack unwinding takes place. This is possible with using SEH (structured exception handling). Those who are not familiar with SEH - there's plenty of online documentation about this, I recommend reading them. I also recommend reading the article about exception handling models.

To do the trick, we should either replace or wrap the traditional try/catch blocks with __try/__except ones. This is what you should do:

/////////////////////////////////
// traditional exc handling block
try {
    DoSomething();
catch (/* ... */) {
    Log::PrintCheckpoints(); // Oops! There's nothing there
}

/////////////////////////////////
// Now putting the __try/__except in the middle:
try {
    DoSomethingGuarded();
catch (/* ... */) {
    // Checkpoints already dumped
    // Now - handle the exception
}

void DoSomethingGuarded()
{
    __try {
        DoSomething();
    } __except (Log::PrintCheckpoints(), EXCEPTION_CONTINUE_SEARCH) {
        // never get there
    }
}

/////////////////////////////////
// Use only __try/__except, omit the traditional try/catch
__try {
    DoSomething();
} __except (Log::PrintCheckpoints(), EXCEPTION_EXECUTE_HANDLER) {
    // Checkpoints are already dumped.
    // Now - handle the exception
}

For those not familiar with SEH, the above may look complicated. But in fact this is not complex at all. The traditional C++ exception handling mechanism is implemented (at least by MSVC) via SEH. However, unlike SEH, the C++ exception handling mechanism is concealed. In order to "reveal" it, pure SEH must be used.

Conclusion

I have been using this type of logging for a long time now (many years), and it has proved to be great. I hate to bloat programs with slow and heavy things, I'm very much a minimalist. Especially when it comes to high-performance programs that must squeeze the maximum from the hardware. And, I personally feel that this logging hits the sweat spot.

Actually, I "invented" such a logging when I learned about SEH and started using it. It's clearly superior to C++ exception handling, plus, as discovered, C++ exceptions are implemented in terms of SEH, so that SEH actually covers all the exception and crash handling. One of the greatest advantages of SEH is that upon exception, you get an opportunity to do things before the stack unwinding takes place. This is very valuable for debugging, because the whole stack, including the exception raising code, is still valid at this point, and it contains very valuable information.

At some point, I started to put some "decorations" on stack, because raw stack is not-so-much readable. Those decorations consisted just of a single string. Eventually, I wanted to add some run-time parameters into those strings, but the string formatting is heavy. Plus, the length of the formatted string is not known at compilation time, hence at run-time, it should be allocated on heap (otherwise, the log string may be truncated), which is very performance-killing. This way, I started to use "deferred" formatting. And one day, I realized that checkpoints are good not only for crash investigation, they can actually replace traditional logging completely. Just convert all the logging statements into checkpoints. This will throw away most of the junk since there's no need to pass all the context information everywhere. And, log the checkpoints only when necessary: upon exceptions (in cases where they're considered abnormal), and in several places on-demand.

By the way, recently, I had some experience with Apache log (a.k.a. log4cxx), which is a huge and very popular logging library with enormous options and functionality. I was surprised: it comes out that in addition to traditional logging, they have also created something similar to what I did! It's called NDC - "nested diagnostic contexts", which is a per-thread LIFO queue (stack) of strings that you can push/pop, and they're automatically appended to every log message.

However, they did only a part of the job, which makes the use of their NDCs very limited, in my opinion. It's extremely important for this mechanism to be fast and convenient, and the NDC of log4cxx loses in both criteria. From the performance point of view, it's awful - it uses a lot of heap operations, string duplication, and etc. And, from the convenience of usage, it's too awkward - it just takes strings. Plus, it allows uncontrollable push/pop of strings: in particular, you may push a string somewhere and not pop it back upon scope exit. This will clearly lead to a mess. The RAII wrapper must be used to prevent this. Otherwise, your "nested diagnostic context" may contain information that is no more actual. In addition, all the NDC strings are automatically appended to every log message, which is not what you want.

I talked with some people who really admired log4cxx. I asked what should be the main usage of NDC. They told me something about threads. That is, in multithreaded applications, different threads log into the same file, and everything gets mixed. NDC is good to distinguish what comes from where.

But this is ridiculous. People just don't realize the huge potential of this idea. What I did, more-or-less, should be the most obvious use of NDC. It's just that NDC must be fast and lightweight, which it is not.

As I've already said - the implementation provided is a skeleton. It does not do the actual logging. You may use it as a wrapper of some other log system that supports filtering, buffering, and etc., if this is applicable for your case.

I'll appreciate comments, both positive and negative. New ideas and criticism are welcome.