![]() |
General Programming »
Algorithms & Recipes »
Compilers
Intermediate
Wave: a Standard conformant C++ preprocessor libraryBy Hartmut KaiserDescribes a free and fully Standard conformant C++ preprocessor library |
VC7.1Win2K, WinXP, Visual-Studio, STL, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||

Ever wanted to have your own C/C++ preprocessor? Or maybe you are curious about how this invisible everyday helper of your toolbox works? If yes, you may want to read further. If no - before hitting the 'back' button of your browser consider to learn something new and read further too :-).
The C++ preprocessor is a macro processor that under normal circumstances is used automatically by your C++ compiler to transform your program before the actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs. The C++ preprocessor provides four separate facilities that you can use as you see fit:
These features are greatly underestimated today, even more, the preprocessor has been frowned on for so long that its usage just hasn't been effectively pushed until the Boost preprocessor library [1] came into being a few years ago. Only today we begin to understand, that preprocessor generative metaprogramming combined with template metaprogramming in C++ is by far one of the most powerful compile-time reflection/metaprogramming facilities that any language has ever supported.
The C++ Standard [2] was adopted back in 1998, but there is still no (known to me) C++ compiler, which has a bug free implementation of the rather simple preprocessor requirements mandated therein. This may be a result of the mentioned underestimation or even banning of the preprocessor from good programming style during the last few years or may stem from the somewhat awkward standardized dialect of English used to describe it.
So the Wave preprocessor library is an attempt to:
To simplify the parsing task of the input stream (which is most of the time, but not restricted to, a file) the Spirit parser construction library [4] is used.
The Wave C++ preprocessor is not a monolithic application, it's rather a modular library, which exposes mainly a context object and an iterator interface. The context object helps to configure the actual preprocessing process (as search path's, predefined macros, etc.). The exposed iterators are generated by this context object too. Iterating over the sequence defined by these two iterators will return the preprocessed tokens, which are to be built on the fly from the given input stream.
The C++ preprocessor iterator itself is feeded by a C++ lexer iterator, which implements an unified interface. By the way, the C++ lexers contained within the Wave library may be used standalone too and are not tied to the C++ preprocessor iterator at all. As a lexer I'll understand a piece of code, which combines several consecutive characters in the input stream into a stream of objects (called tokens) more suitable for subsequent parsing. These tokens carry around not only the information about the matched character sequence, but additionally the position in the input stream, where a particular token was found. In other words the lexer removes all this so-needed-by-human garbage like spaces, newlines, etc. (i.e. performs some lexical transformation) leaving the structural transformation for parser.
To make the Wave C++ preprocessing library modular, the C++ lexer is held completely separate and independent from the preprocessor. To proof this concept, there are two different C++ lexers implemented and contained within the library by now, which are functionally completely identical. The C++ lexers expose the mentioned unified interface, so that the C++ preprocessor iterator may be used with both of them. The abstraction of the C++ lexer from the C++ preprocessor iterator library was done to allow to plug in different other C++ lexers too, without the need to re-implement the preprocessor. This will allow for benchmarking and specific finetuning of the process of preprocessing itself.
During the last weeks Wave got another field of application: testing the usability and applicability of different Standards proposals. A new C++0x mode was implemented, which allows to try out and help to establish some ideas, which are designed to overcome some of the known limitations of the C++ preprocessor.
The actual preprocessing is a highly configurable process, so obviously you have to define a couple of parameters to control this process, such as:
#include <...> and #include "..."
directives
You can access all these processing parameters through the
wave::context object. So you have to instantiate at least one
object of this type to use the Wave library. For more information about
the context template please refer to the class reference as included in the
downloadable file or as may be found here.
The context object is a template class, for which you have to supply at least
two template parameters: the iterator type of the underlying input stream to use
and the type of the token to be returned from the preprocessing engine. The type
of the used input stream is defined by you, so may the token type, but as a
starting point I would recommend to use the token type predefined as the default
inside the Wave library - the wave::cpplexer::lex_token<>
template class. A full reference of this class you can find inside the
downloadable file or here.
The main preprocessing iterators are not to be instantiated directly, but
should be generated through this context object too. The following code snippet
preprocesses a given input file and outputs the generated text into
std::cout.
// Open the file and read it into a string variable std::ifstream instream("input.cpp"); std::string input( std::istreambuf_iterator<char>(instream.rdbuf()); std::istreambuf_iterator<char>()); // The template wave::cpplexer::lex_token<> is the default // token type to be used by the Wave library. // This token type is one of the central types throughout // the library, because it is a template parameter to many // of the public classes and templates and it is returned // from the iterators itself. typedef wave::context<std::string::iterator, wave::cpplexer::lex_token<> > context_t; // The C++ preprocessor iterators shouldn't be constructed // directly. These are to be generated through a // wave::context<> object. Additionally this wave::context<> // object is to be used to initialize and define different // parameters of the actual preprocessing. context_t ctx(input.begin(), input.end(), "input.cpp"); context_t::iterator_t first = ctx.begin(); context_t::iterator_t last = ctx.end(); // The preprocessing of the input stream is done on the fly // behind the scenes during the iteration over the // context_t::iterator_t based stream. while (first != last) { std::cout << (*first).get_value(); ++first; }
This sample shows, how the input may be read into a string variable, from
where it is fed into the preprocessor. But the parameters to the constructor
of the wave::context<> object are not restricted to this type
of input stream. It can take a pair of arbitrary iterator types (conceptually at
least forward_iterator type iterators) to the input stream, from
where the data to be preprocessed should be read. The third parameter supplies a
filename, which is subsequently accessible from inside the preprocessed tokens
returned from the preprocessing to indicate the token position inside the
underlying input stream. Note though, that this filename is used only as long no
#include or #line directives are encountered, which in
turn will alter the current filename.
The iteration over the preprocessed tokens is relatively straight forward. Just get the starting and the ending iterators from the context object (maybe after initializing some include search paths) and you are done! The dereferencing of the iterator will return the preprocessed tokens, which are generated on the fly from the input stream.
As you may have seen, the complete library resides in a C++ namespace
wave. So you have to explicitly specify this while using the different
classes. The other way around is certainly to place a using namespace
wave; somewhere at the beginning of your source files.
If you ever had the need to debug a macro expansion you had to discover, that your tools provide only little or no support for this task. For this reason the Wave library got a tracing facility, which allows to get selectively some information about the expansion of a certain macro or several macros.
The tracing of macro expansions generates a possibly huge amount of
information, so it is recommended, that you explicitly enable/disable the
tracing for the macro in question only. This may be done with the help of a
special #pragma:
#pragma wave trace(enable) // enable the tracing // the macro expansions here will be traced // ... #pragma wave trace(disable) // disable the tracing
To see, what the Wave driver generates while expanding a simple macro, I suggest, that you try to compile the following with 'wave -t test.trace test.cpp':
// test.cpp #define X(x) x #define Y() 2 #define CONCAT_(x, y) x ## y #define CONCAT(x, y) CONCAT_(x, y) #pragma wave trace(enable) // this macro expansion is to be traced CONCAT(X(1), Y()) // should expand to 12 #pragma wave trace(disable)
After executing this command the file test.trace will contain the generated trace output. The generated output is relatively straightforward to understand, but you can find a thorough description of the trace output format in the documentation included with the downloadable file.
In order to prepare and support a proposal for the C++ Standards committee, which will describe certain new and enhanced preprocessor facilities, the Wave preprocessor library has implemented experimental support for the following features:
Variadic macros and placemarker tokens are known already from the C99 Standard. Its addition to the C++ Standard would help to make C99 and C++ less different.
Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple preprocessing tokens) is currently undefined behaviour for no substantial reason. It is not dependent on architecture nor is it difficult for an implementation to diagnose. Furthermore, retokenization is what most, if not all, preprocessors already do and what most programmers already expect the preprocessor to do. Well-defined behavior is simply standardizing existing practice and removing an arbitrary and unnecessary undefined behavior from the Standard.
One of the major problems of the preprocessor is that macro definitions do not respect any of the scoping mechanisms of the core language. As history has shown, this is a major inconvenience and drastically increases the likelihood of name clashes within a translation unit. The solution is to add both a named and unnamed scoping mechanism to the C++ preprocessor. This limits the scope of macro definitions without limiting its accessibility.
The proposed scoping mechanism is implemented with the help of three new
preprocessor directives: #region, #endregion and
#import (note that the actual names for the directives may change
during the standardization process). Additionally it changes minor details of
some of the existing preprocessor directives: #ifdef,
#ifndef and the operator defined().
To avoid overly detailed descriptions of the new features in this article, a simple example is provided here (taken from the experimental version of the preprocessor library written by Paul Mensonides), which demonstrates the proposed extensions:
# ifndef ::CHAOS_PREPROCESSOR::chaos::WSTRINGIZE_HPP
# region ::CHAOS_PREPROCESSOR::chaos
#
# define WSTRINGIZE_HPP
#
# include <chaos/experimental/cat.hpp>
#
# // wstringize
#
# define wstringize(...) \
chaos::primitive_wstringize(__VA_ARGS__) \
/**/
#
# // primitive_wstringize
#
# define primitive_wstringize(...) \
chaos::primitive_cat(L, #__VA_ARGS__) \
/**/
#
# endregion
# endif
# import ::CHAOS_PREPROCESSOR
chaos::wstringize(a,b,c) // expands to: L"a,b,c"
The macro scope syntax is resembled after the namespace scoping already known
from the core C++ language. There is a significant difference though. The
#region and #endregion directives are opaque for any
macro definition from outside or inside the spanned region, respective. This way
macros defined inside a specific region are visible from outside this region
only, if these are imported (by the #import directive) or if these
are qualified (as for instance the argument to the #ifndef
directive above).
For more details about the new experimental features please refer to the documentation included with the downloadable file.
The described features are enabled by the --c++0x command line
option of the Wave driver. Alternatively you can enable these features by
calling the wave::context<>::set_language() function with the
wave::support_cpp0x value.
To see, how you may write a full blown preprocessor, you may refer to the Wave driver sample, included in the downloadable file. This Wave driver program fully utilizes the capabilities of the library. It is usable as a preprocessor executable on top of any other C++ compiler. It outputs the textual representation of the preprocessed tokens generated from a given input file. This driver program has the following command line syntax:
Usage: wave [options] [@config-file(s)] file:
Options allowed on the command line only:
-h [--help]: print out program usage (this message)
-v [--version]: print the version number
-c [--copyright]: print out the copyright statement
--config-file filepath: specify a config file (alternatively: @filepath)
Options allowed additionally in a config file:
-o [--output] path: specify a file to use for output instead of
stdout
-I [--include] path: specify an additional include directory
-S [--sysinclude] syspath: specify an additional system include directory
-F [--forceinclude] file: force inclusion of the given file
-D [--define] macro[=[value]]: specify a macro to define
-P [--predefine] macro[=[value]]: specify a macro to predefine
-U [--undefine] macro: specify a macro to undefine
-n [--nesting] depth: specify a new maximal include nesting depth
Extended options (allowed everywhere)
-t [--traceto] path: output trace info to a file [path] or to stderr [-]
--timer: output overall elapsed computing time to stderr
--variadics: enable variadics and placemarkers in C++ mode
--c99: enable C99 mode (implies variadics)
--c++0x: enable experimental C++0x support (implies
variadics)
To allow the tracing output, the Wave driver now has a special command
line option -t (--trace), which should be used to specify a file, to which the
generated trace information will be put. If you use a single dash ('-') as the
file name, the output goes to the std::cerr stream.
There is left one caveat to mention. To use the Wave library or to compile the Wave driver yourself you will need at least the VC7.1 compiler (the C++ compiler included in the VS.NET 2003 release). Alternatively you may compile it with a recent version of the gcc compiler (GNU Compiler Collection) or the Intel V7.0 C++ complier. Sorry, for now no VC6 and no VC7 - these are to far away from C++ Standard conformance. But I will eventually try to alter parts of the Wave library to make it compilable with this compilers too - it depends on your response.
Wave depends on the Boost library (at least V1.30.2) and the Program Options library from Vladimir Prus (at least rev. 160, recently adopted to Boost, but not included yet) , so please be sure to install these libraries, before trying to recompile Wave.
Despite the fact, that the Wave library is quite complex and heaviliy uses advanced C++ idioms, as templates and template based metaprogramming, it is farely simple to be used in a broad spectrum of applications. It nicely fits into well known paradigms used over years by the C++ Standard Template Library (STL).
The Wave driver program is the only known to me C++ preprocessor, which
therefore it may be an invaluable tool for the development of modern C++ programs.
As recent developments like the Boost Preprocessor Library show [1], we will see in the future a lot of applications for advanced preprocessor techniques. But these need a solid base - a Standard conformant preprocessor. As long as the widely available compilers do not fit into these needs, the Wave library may fill this gap.
__INCLUDE_LEVEL__
operator _Pragma() (C99 and --variadics
mode only)
_Pragma wave system()
true and false), but only identifiers.
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 10 Jan 2004 Editor: Nishant Sivakumar |
Copyright 2003 by Hartmut Kaiser Everything else Copyright © CodeProject, 1999-2010 Web17 | Advertise on the Code Project |