Click here to Skip to main content
Click here to Skip to main content
Go to top

Building systems for automatic C/C++ code logging

, 1 Jun 2008
Rate this:
Please Sign up or sign in to vote.
The article touches upon a method allowing you to build a system for automatic logging of C/C++ code.

Abstract

Sometimes logging an application's events is the only debugging method. The logging method's disadvantage is the large size of the code which you have to write manually to save all the necessary information. The article touches upon a method allowing you to build a system for automatic logging of your C/C++ code.

Introduction

Despite the current epoch, C and C++ languages remain leaders in many programming spheres. And, this will hardly change in the next 10 years. A lot of nice and interesting languages have appeared, but C and C++ are more preferable in practice. They are among the best existing languages - universal, high-performance, and with wide support.

Irrespective of the language used, there are errors made by developers, and C/C++ is not an exclusion. Moreover, even when using professional tools, attention and accuracy are required. As a result, one of the most important tasks while developing applications is code debugging, i.e., the search and correction of errors.

Debugging methods can be divided into the following main groups:

  • Interactive debugging tools;
  • Runtime-diagnosis;
  • Visual (graphical) debugging tools;
  • Unit testing;
  • Functional tests;
  • Logging;
  • Crash-dumps debugging;
  • Code review;
  • Static code analysis.

Each of the mentioned methods has its advantages and disadvantages, about which you may read in the article "Ways of debugging applications." But within the framework of our article, we'll speak about logging and methods for automation.

1. Why logging?

At first, logging application work may seem non-topical. Perhaps, it's just an atavism of those times when the results of a program's work would be immediately printed? No, this is a very effective and often very essential method, allowing you to debug complex, parallel, or specific applications.

Let's consider the spheres where logging is irreplaceable because of its convenience and efficiency:

  1. Debugging release-versions of an application. Sometimes, a release-version behaves in a different way than a debug-version, and it may be related to the errors of uninitialized memory etc. But it is often inconvenient to work with the release-version in a debugger. And besides, although it happens quite seldom, there are compiler errors which appear only in release-versions. Logging in these cases is a good substitute for using the compiler.
  2. Debugging security mechanisms. Development of applications with hardware security (for example, on the basis of Hasp keys) is often difficult because debugging is impossible here. Logging, in this case, seems to be the only way of searching for errors.
  3. Logging is the only possible debugging method for an application, launched on the end user's computer. The accurate use of the log file will allow developers to get the full information necessary for diagnosis of problems.
  4. Logging allows you to debug device drivers and programs for embedded systems.
  5. Logging allows you to quickly detect errors after a batch launch of functional or load tests.
  6. One more interesting case of using logs is to view differences between two different versions (diff). You can try and think about where it can be useful in your projects.
  7. Logging enables remote debugging when interactive means are impossible or inaccessible. This is convenient for high-performance multi-user systems where the user puts his tasks into a queue and waits for them to be fulfilled. This approach is used nowadays in institutions and other organizations working with computing clusters.
  8. Possibility to debug parallel applications. In such applications, errors often occur when creating a lot of threads or when there are problems with synchronization. Errors in parallel programs are rather difficult to correct. A good method of detecting such errors is periodical logging of systems which relate to the error, and examining of the log's data after a program crash.

This list is too large to reject the logging method. I hope that this article will help you to find other ways to use the described logging method with success.

2. Creation of a logging system

Let's begin with the demands fulfilled by a modern logging system:

  • The code providing data logging in the debug-version shouldn't be present in the release-version of a program. This is to speed up the performance and decrease the size of the program. Also, it makes it impossible to use the logging information for cracking the application or for other illegal actions. Pay attention that it is the final version of the program that is meant as the log may be created by both debug- and release-versions.
  • The logging system's interfaces should be compact enough not to overload the main program code.
  • Saving data should be carried as quickly as possible in order to bring the minimum change into the temporary characteristics of parallel algorithms.
  • The log should be understandable and easy to analyze. There should be a possibility to divide the information received from different threads and to vary the number of details.
  • Besides logging of the application's events, it is useful to collect data about the computer too.
  • It is desirable that the system saves the unit's name, the file name, and the string's number where the data record occurred. Sometimes, it is useful to save the time when an event took place.

A logging system meeting these demands allows you to fulfill various tasks from developing security mechanisms to searching for errors in parallel algorithms.

Although this article is devoted to a system for data logging, it won't touch upon a complete version of such a system. A universal version is impossible as it will depend upon the development environment, the project's peculiarities, the developer's preferences etc.

Now, let's speak about some technical solutions which will help you to create a convenient and efficient logging system when you need it.

The simplest way to carry out logging is to use a function similar to printf, as in the following example:

  int x = 5, y = 10;
  ...
  printf("Coordinate = (%d, %d)\n", x, y);

The natural disadvantage here is that information will be shown both in the debug-mode and in the release-version. That's why we should modify the code in the following way:

#ifdef DEBUG_MODE
  #define WriteLog printf
#else
  #define WriteLog(a)
#endif
  WriteLog("Coordinate = (%d, %d)\n", x, y);

This is better. And, pay attention that to implement a WriteLog function, we use our own macro DEBUG_MODE instead of the standard _DEBUG. This allows you to include logging information into release-versions which is important when debugging large data.

Unfortunately, now when compiling a non-debug version, for example, in a Visual C++ environment, a warning message appears: "warning C4002: too many actual parameters for macro 'WriteLog'". You could disable this warning, but it will be a bad style. You can rewrite the code as follows:

#ifdef DEBUG_MODE
  #define WriteLog(a) printf a
#else
  #define WriteLog(a)
#endif
  WriteLog(("Coordinate = (%d, %d)\n", x, y));

This code is not smart as you have to use double bracket pairs and you can lose them. That's why, we'll improve it a bit:

#ifdef DEBUG_MODE
  #define WriteLog printf
#else
  inline int StubElepsisFunctionForLog(...) { return 0; }
  static class StubClassForLog {
  public:
    inline void operator =(size_t) {}
  private:
    inline StubClassForLog &operator =(const StubClassForLog &)
      { return *this; }
  } StubForLogObject;
  
  #define WriteLog \
    StubForLogObject = sizeof StubElepsisFunctionForLog
#endif
  WriteLog("Coordinate = (%d, %d)\n", x, y);

This code looks complicated, but it allows you to write single brackets. When DEBUG_MODE is disabled, this code turns to nothing and you can safely use it in critical code sections.

The next improvement is to add to the logging function such parameters as the number of details and the type of the printed information. The number of details can be defined as a parameter, for example:

enum E_LogVerbose {
  Main,
  Full
};
#ifdef DEBUG_MODE
  void WriteLog(E_LogVerbose,
                const char *strFormat, ...)
  {
    ...
  }
#else
  ...
#endif
WriteLog (Full, "Coordinate = (%d, %d)\n", x, y);

This method is convenient because the decision of whether to filter unimportant messages or not can be made after the program's shutdown by using a special utility. This method's disadvantage is that the full information is always printed, both important and unimportant, which may decrease the performance. That's why you should create several functions of the WriteLogMain- and WriteLogFull-type, the implementation of which will depend upon the program build mode.

We know that the process logging information should influence the speed of the algorithm's work as little as possible. You can do this by creating a system for collecting messages which are recorded in a parallel thread. This has even more advantages because of the wide spread use of multi-core (multi-processor) systems. The scheme of this mechanism is shown in picture 1.

01000000.png

Picture 1.Logging system with delayed data record

As you can see in the picture, the next data portion is recorded into an intermediate array with strings of fixed length. The fixed length of the array and its strings allows you to avoid expensive operations of memory allocation. It doesn't decrease this system's possibilities at all. You can just select the strings' length and the array's size with some reserve. For example, 5000 strings of 4000-symbol length will be enough for debugging nearly any system. I think you agree that a 20-MB memory size necessary for this is not critical for modern systems. But if an overflow of the array still occurs, you can easily create a mechanism of anticipatory information record into the file.

This mechanism provides nearly instant execution of the WriteLog function. If there are unloaded processor cores in the system, the file record will be transparent for the main threads of the program being logged.

The advantage of the described system is that it can work practically without changes when debugging a parallel program, when several threads are written into the log simultaneously. You should only add the saving of a thread-indicator so that you can later see from which thread messages were received (see Picture 2).

02000000.png

Picture 2. Logging system when debugging multithreaded applications

This scheme can cause a change of temporal characteristics as several threads being logged have to wait for each other to carry out information recording. If it is critical, you may create separate message storages for each of the threads, as is shown on Picture 3. In this case, you should also record the time of the events so that you could later merge two logs into one.

03000000.png

Picture 3. Improved logging system when debugging multithreaded applications

The last improvement I would like to offer provides a demonstration of message nesting level when calling functions or in the beginning of a logical block. You can easily implement it by using a special class which writes the block's beginning identifier into the log in the constructer and the block's end identifier in the destructor. By writing a small utility, you may transform the log, relying on the information about identifiers. Let's try to show it by example.

Program code:

class NewLevel {
public:
  NewLevel() { WriteLog("__BEGIN_LEVEL__\n"); }
  ~NewLevel() { WriteLog("__END_LEVEL__\n"); }
};
#define NEW_LEVEL NewLevel tempLevelObject;
void MyFoo() {
  WriteLog("Begin MyFoo()\n");
  NEW_LEVEL;
  int x = 5, y = 10;
  printf("Coordinate = (%d, %d)\n", x, y);
  WriteLog("Begin Loop:\n");
  for (unsigned i = 0; i != 3; ++i)
  {
    NEW_LEVEL;
    WriteLog("i=%u\n", i);
  }
}

The log's content:

Begin MyFoo()
__BEGIN_LEVEL__
Coordinate = (5, 10)
Begin Loop:
__BEGIN_LEVEL__
i=0
__END_LEVEL__
__BEGIN_LEVEL__
i=1
__END_LEVEL__
__BEGIN_LEVEL__
i=2
__END_LEVEL__
Coordinate = (5, 10)
__END_LEVEL__

The log after transformation:

Begin MyFoo()
    Coordinate = (5, 10)
    Begin Loop:
        i=0
        i=1
        i=2
    Coordinate = (5, 10)

3. Automation of logging systems

We have considered the principles on which a logging system can be implemented. All the demands described in the first part can be implemented in such a system. But a serious disadvantage of this system is the necessity to write a great amount of code for recording all the necessary data. There are other disadvantages too:

  1. Impossibility to implement smart logging functions for the C language as it lacks classes and templates. As a result, the code for logging is different for C and C++ files.
  2. Necessity to keep in mind that you should write NEW_LEVEL in every block or function if you wish your code to look smart.
  3. Impossibility to automatically save the names of all the called functions. Necessity to manually write the input function arguments.
  4. Overload of source texts with additional constructions as, for example, NEW_LEVEL.
  5. Necessity to make an effort to make sure that all the logging constructions should be excluded from the final version of the program.
  6. Necessity to write functions for initializing a logging system, to write all the necessary "#include"s, and to perform other auxiliary actions.

All these and other inconveniences can be avoided if you build a system for automatic logging for testing applications.

This system can be implemented on the basis of the meta-programming method, introducing new constructions of data recording into the C/C++ language.

Metaprogramming is the creation of programs which, in their turn, create other programs [7]. There are two main trends in meta-programming: code generation and self-modified code. We are interested in the first one. In this case, the program code with embedded logging mechanisms is not written manually, but created automatically by a generator-program on the basis of another, simpler program. This allows you to get a program with less time, effort, and costs than in the case when a programmer implements the whole code.

There are languages for which meta-programming is a constituent part. Such an example is the Nemerle language [1]. But it is more difficult for C/C++, and meta-programming in them is implemented in the following two ways:

  1. Templates in C++, and preprocessors in C. Unfortunately, as was shown before, it's not enough.
  2. Outer language means. The generator's language is composed in a way to automatically, or with a programmer's minimum effort, implement the paradigm's rules or the necessary special functions. In fact, a more high-level programming language is created. It is this approach that can be used for the creation of a system for automated program logging.

By introducing new keywords into C/C++, you can get a flexible and convenient programming system with very powerful logging possibilities. The use of a meta-language provides great possibilities for choosing the form in which the necessary information will be recorded. You may log the data by introducing a common function format:

EnableLogFunctionCallForFhisFile(true);
...
ObjectsArray &objects = getObjects();
WriteLog("objects.size = %u", objects.size());
for (size_t i = 0; i != objects.size(); ++i) {
    WriteLog("object type = %i", int(objects[i]->getType()));
    if (objects[i]->getType() != TYPE_1)
        ...

Since the code generator can recognize the data types being used, you may simplify the logging of variables and shorten the record. For example:

ObjectsArray &objects = getObjects();
MyLog "objects.size = ", objects.size()
for (size_t i = 0; i != objects.size(); ++i) {
    MyLog "object type = ", int(objects[i]->getType())
    if (objects[i]->getType() != TYPE_1)
        ...

You may go further, as everything here depends upon the author's fantasy:

thisfile.log.FunctionCall = ON
...
ObjectsArray &objects = getObjects();
for (size_t i = 0;
    i != <log>"objects.size = "objects.size()<log>;
    ++i) {
    LOG: "object type = " {as int} objects[i]->getType();
    if (objects[i]->getType() != TYPE_1)
        ...

The new meta-language can be implemented as an intermediate constituent between a preprocessor and a compiler. It depends upon the development environment how this intermediate component for translating a meta-program should look. In general, the functioning scheme looks as in Picture 4.

04en0000.png

Picture 4. Role of a meta-language translator in the compilation process

Now, let's consider the translator itself. Its functional scheme is shown in Picture 5.

05en0000.png

Picture 5. Functional scheme of language translator for generating code with logging support

Let's dwell upon the code generator. It is an algorithm for traversing a tree and carrying out three operations over it. The first is collecting the types of all the functions and objects and estimating their visibility scope. It will allow you to automatically generate code for correctly recording the arguments of functions and other objects. Secondly, meta-language constructions are opened into new tree branches. And thirdly, processed code branches are recorded back into the file.

It is not so simple to implement such a generator as well as a syntactic analyzer. But there are corresponding libraries which will be described further.

Let's see how the generation of program code solves the above mentioned disadvantages of a system built only on the basis of macros and templates.

  1. Logging constructions in C may look as smart as in C++. The generator performs the duty of turning reserved words into code for writing messages. It is transparent for the user that this code will look in different ways in C and C++ files.
  2. The generator can automatically insert code for marking the beginning and the end of functions or blocks. There is no need to use the NEW_LEVEL macro.
  3. You may automatically save the names of all or some of the functions being called. You may also automatically save the values of input parameters for the basic data types.
  4. The text is no more overloaded with auxiliary constructions.
  5. It is guaranteed that all the functions, sources, and objects won't be present in the final version of the program product. The generator can just skip all the special elements related to logging in the program text.
  6. There is no need to write initializing functions, to insert "#include", and perform other auxiliary actions, as they can be implemented in the step of code translation.

But this scheme has a problem when debugging such modified programs with the help of a debugger. On one hand, if there is a logging system, the debugger is not an important component of application development. But on the other hand, it is often a very useful tool and you don't want to refuse it.

The trouble is that after code translation (opening of operators for logging), there is a problem for navigation on strings' numbers. It can be solved with the help of specific program means which are individual for each development environment. But there is a simpler way. You may use an approach similar to the one in OpenMP, i.e., use "#pragma". In this case, the logging code will look like this:

ObjectsArray &objects = getObjects();
#pragma Log("objects.size = ", objects.size()) 
for (size_t i = 0; i != objects.size(); ++i) {
    #pragma Log("object type = ", objects [i]->getType())
    if (objects[i]->getType() != TYPE_1)
        ...

It's not so smart because of the word "#pragma" but this program text has great advantages. This code can be safely compiled on another system where a system of automatic logging is not used. This code can be ported on another system or be given to third-party developers. And, of course, there are no obstacles in working with this code in the debugger.

4. Toolkit

Unfortunately, I don't know if there are metalanguage-based tools for automated logging similar in functionality to those described in the article. If further research shows the absence of such developments, perhaps I will take part in a new project which is being designed now. One may say this article is a kind of research whether a universal logging system for C/C++ languages is topical.

And for the present, I urge the readers who are interested to create a logging system on their own. Whether it is just a set of templates and macros, or a full system of code generation depends upon the need for such a system and the amount of resources necessary to create it.

If you decide to create a metalanguage, I'll advise you on what basis it can be created.

There are two most suitable free libraries allowing you to represent a program as a tree, translate it, and save it as a program again.

One of them is OpenC++ (OpenCxx). It is an open, free library which is a "source-to-source" translator. The library supports meta-programming, and allows you to create, on its basis, extensions for the C++ language. On the basis of this library, such solutions were created, as an execution environment, as OpenTS for the T++ programming language (a product by the Institution of program systems RAS) and the Synopsis tool for preparing documentations on a source code.

If you would like to use these libraries to create your own extensions, you will need a preprocessor that must be launched before they start working. Most likely, preprocessor here means embedded into the compiler. Otherwise, you could try using The Wave C++ preprocessor library.

Conclusion

The article is of a theoretical character, but I hope that developers will find here a lot of interesting and useful ideas related to the sphere of logging of their programs' code. Good luck!

Sources

  1. Nemerle.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Karpov Andrey
Architect Program Verification Systems, Co Ltd
Russian Federation Russian Federation

Andrey Karpov is technical manager of the OOO "Program Verification Systems" (Co Ltd) company developing the PVS-Studio tool which is a package of static code analyzers integrating into the Visual Studio development environment.

PVS-Studio is a static analyzer that detects errors in source code of C/C++ applications. There are 3 sets of rules included into PVS-Studio:

  1. Diagnosis of 64-bit errors (Viva64)
  2. Diagnosis of parallel errors (VivaMP)
  3. General-purpose diagnosis

Awards: MVP, Intel Black Belt

Andrey Karpov is also the author of many articles on the topic of 64-bit and parallel software development. To learn more about the PVS-Studio tool and sources concerning 64-bit and parallel software development, please visit the www.viva64.com site.

Best Articles:

My page on LinkedIn site: http://www.linkedin.com/pub/4/585/6a3

E-mail: karpov@viva64(dot)com

Follow on   Twitter

Comments and Discussions

 
Generala little digression Pinmemberrobot_chou2-Jun-08 16:55 
GeneralRe: a little digression PinmemberKarpov Andrey2-Jun-08 19:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140922.1 | Last Updated 1 Jun 2008
Article Copyright 2008 by Karpov Andrey
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid