|
|||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionEver wanted to have your own C/C++ preprocessor? Or maybe you are curious about how this invisible everyday helper of your toolbox works? If yes, you may want to read further. If no - before hitting the 'back' button of your browser consider to learn something new and read further too :-). The C++ preprocessor is a macro processor that under normal circumstances is used automatically by your C++ compiler to transform your program before the actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs. The C++ preprocessor provides four separate facilities that you can use as you see fit:
These features are greatly underestimated today, even more, the preprocessor has been frowned on for so long that its usage just hasn't been effectively pushed until the Boost preprocessor library [1] came into being a few years ago. Only today we begin to understand, that preprocessor generative metaprogramming combined with template metaprogramming in C++ is by far one of the most powerful compile-time reflection/metaprogramming facilities that any language has ever supported. The C++ Standard [2] was adopted back in 1998, but there is still no (known to me) C++ compiler, which has a bug free implementation of the rather simple preprocessor requirements mandated therein. This may be a result of the mentioned underestimation or even banning of the preprocessor from good programming style during the last few years or may stem from the somewhat awkward standardized dialect of English used to describe it. So the Wave preprocessor library is an attempt to:
To simplify the parsing task of the input stream (which is most of the time, but not restricted to, a file) the Spirit parser construction library [4] is used. BackgroundThe Wave C++ preprocessor is not a monolithic application, it's rather a modular library, which exposes mainly a context object and an iterator interface. The context object helps to configure the actual preprocessing process (as search path's, predefined macros, etc.). The exposed iterators are generated by this context object too. Iterating over the sequence defined by these two iterators will return the preprocessed tokens, which are to be built on the fly from the given input stream. The C++ preprocessor iterator itself is feeded by a C++ lexer iterator, which implements an unified interface. By the way, the C++ lexers contained within the Wave library may be used standalone too and are not tied to the C++ preprocessor iterator at all. As a lexer I'll understand a piece of code, which combines several consecutive characters in the input stream into a stream of objects (called tokens) more suitable for subsequent parsing. These tokens carry around not only the information about the matched character sequence, but additionally the position in the input stream, where a particular token was found. In other words the lexer removes all this so-needed-by-human garbage like spaces, newlines, etc. (i.e. performs some lexical transformation) leaving the structural transformation for parser. To make the Wave C++ preprocessing library modular, the C++ lexer is held completely separate and independent from the preprocessor. To proof this concept, there are two different C++ lexers implemented and contained within the library by now, which are functionally completely identical. The C++ lexers expose the mentioned unified interface, so that the C++ preprocessor iterator may be used with both of them. The abstraction of the C++ lexer from the C++ preprocessor iterator library was done to allow to plug in different other C++ lexers too, without the need to re-implement the preprocessor. This will allow for benchmarking and specific finetuning of the process of preprocessing itself. During the last weeks Wave got another field of application: testing the usability and applicability of different Standards proposals. A new C++0x mode was implemented, which allows to try out and help to establish some ideas, which are designed to overcome some of the known limitations of the C++ preprocessor. Using the codeThe actual preprocessing is a highly configurable process, so obviously you have to define a couple of parameters to control this process, such as:
You can access all these processing parameters through the
The main preprocessing iterators are not to be instantiated directly, but
should be generated through this context object too. The following code snippet
preprocesses a given input file and outputs the generated text into
// Open the file and read it into a string variable std::ifstream instream("input.cpp"); std::string input( std::istreambuf_iterator<char>(instream.rdbuf()); std::istreambuf_iterator<char>()); // The template wave::cpplexer::lex_token<> is the default // token type to be used by the Wave library. // This token type is one of the central types throughout // the library, because it is a template parameter to many // of the public classes and templates and it is returned // from the iterators itself. typedef wave::context<std::string::iterator, wave::cpplexer::lex_token<> > context_t; // The C++ preprocessor iterators shouldn't be constructed // directly. These are to be generated through a // wave::context<> object. Additionally this wave::context<> // object is to be used to initialize and define different // parameters of the actual preprocessing. context_t ctx(input.begin(), input.end(), "input.cpp"); context_t::iterator_t first = ctx.begin(); context_t::iterator_t last = ctx.end(); // The preprocessing of the input stream is done on the fly // behind the scenes during the iteration over the // context_t::iterator_t based stream. while (first != last) { std::cout << (*first).get_value(); ++first; } This sample shows, how the input may be read into a string variable, from
where it is fed into the preprocessor. But the parameters to the constructor
of the The iteration over the preprocessed tokens is relatively straight forward. Just get the starting and the ending iterators from the context object (maybe after initializing some include search paths) and you are done! The dereferencing of the iterator will return the preprocessed tokens, which are generated on the fly from the input stream. As you may have seen, the complete library resides in a C++ The Wave tracing facilityIf you ever had the need to debug a macro expansion you had to discover, that your tools provide only little or no support for this task. For this reason the Wave library got a tracing facility, which allows to get selectively some information about the expansion of a certain macro or several macros. The tracing of macro expansions generates a possibly huge amount of
information, so it is recommended, that you explicitly enable/disable the
tracing for the macro in question only. This may be done with the help of a
special #pragma wave trace(enable) // enable the tracing // the macro expansions here will be traced // ... #pragma wave trace(disable) // disable the tracing To see, what the Wave driver generates while expanding a simple macro, I suggest, that you try to compile the following with 'wave -t test.trace test.cpp': // test.cpp #define X(x) x #define Y() 2 #define CONCAT_(x, y) x ## y #define CONCAT(x, y) CONCAT_(x, y) #pragma wave trace(enable) // this macro expansion is to be traced CONCAT(X(1), Y()) // should expand to 12 #pragma wave trace(disable) After executing this command the file test.trace will contain the generated trace output. The generated output is relatively straightforward to understand, but you can find a thorough description of the trace output format in the documentation included with the downloadable file. The experimental C++0x modeIn order to prepare and support a proposal for the C++ Standards committee, which will describe certain new and enhanced preprocessor facilities, the Wave preprocessor library has implemented experimental support for the following features:
Variadic macros and placemarker tokens are known already from the C99 Standard. Its addition to the C++ Standard would help to make C99 and C++ less different. Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple preprocessing tokens) is currently undefined behaviour for no substantial reason. It is not dependent on architecture nor is it difficult for an implementation to diagnose. Furthermore, retokenization is what most, if not all, preprocessors already do and what most programmers already expect the preprocessor to do. Well-defined behavior is simply standardizing existing practice and removing an arbitrary and unnecessary undefined behavior from the Standard. One of the major problems of the preprocessor is that macro definitions do not respect any of the scoping mechanisms of the core language. As history has shown, this is a major inconvenience and drastically increases the likelihood of name clashes within a translation unit. The solution is to add both a named and unnamed scoping mechanism to the C++ preprocessor. This limits the scope of macro definitions without limiting its accessibility. The proposed scoping mechanism is implemented with the help of three new
preprocessor directives: To avoid overly detailed descriptions of the new features in this article, a simple example is provided here (taken from the experimental version of the preprocessor library written by Paul Mensonides), which demonstrates the proposed extensions: # ifndef ::CHAOS_PREPROCESSOR::chaos::WSTRINGIZE_HPP
# region ::CHAOS_PREPROCESSOR::chaos
#
# define WSTRINGIZE_HPP
#
# include <chaos/experimental/cat.hpp>
#
# // wstringize
#
# define wstringize(...) \
chaos::primitive_wstringize(__VA_ARGS__) \
/**/
#
# // primitive_wstringize
#
# define primitive_wstringize(...) \
chaos::primitive_cat(L, #__VA_ARGS__) \
/**/
#
# endregion
# endif
# import ::CHAOS_PREPROCESSOR
chaos::wstringize(a,b,c) // expands to: L"a,b,c"
The macro scope syntax is resembled after the namespace scoping already known
from the core C++ language. There is a significant difference though. The
For more details about the new experimental features please refer to the documentation included with the downloadable file. The described features are enabled by the The command line preprocessor driverTo see, how you may write a full blown preprocessor, you may refer to the Wave driver sample, included in the downloadable file. This Wave driver program fully utilizes the capabilities of the library. It is usable as a preprocessor executable on top of any other C++ compiler. It outputs the textual representation of the preprocessed tokens generated from a given input file. This driver program has the following command line syntax: Usage: wave [options] [@config-file(s)] file:
Options allowed on the command line only:
-h [--help]: print out program usage (this message)
-v [--version]: print the version number
-c [--copyright]: print out the copyright statement
--config-file filepath: specify a config file (alternatively: @filepath)
Options allowed additionally in a config file:
-o [--output] path: specify a file to use for output instead of
stdout
-I [--include] path: specify an additional include directory
-S [--sysinclude] syspath: specify an additional system include directory
-F [--forceinclude] file: force inclusion of the given file
-D [--define] macro[=[value]]: specify a macro to define
-P [--predefine] macro[=[value]]: specify a macro to predefine
-U [--undefine] macro: specify a macro to undefine
-n [--nesting] depth: specify a new maximal include nesting depth
Extended options (allowed everywhere)
-t [--traceto] path: output trace info to a file [path] or to stderr [-]
--timer: output overall elapsed computing time to stderr
--variadics: enable variadics and placemarkers in C++ mode
--c99: enable C99 mode (implies variadics)
--c++0x: enable experimental C++0x support (implies
variadics)
To allow the tracing output, the Wave driver now has a special command
line option -t (--trace), which should be used to specify a file, to which the
generated trace information will be put. If you use a single dash ('-') as the
file name, the output goes to the There is left one caveat to mention. To use the Wave library or to compile the Wave driver yourself you will need at least the VC7.1 compiler (the C++ compiler included in the VS.NET 2003 release). Alternatively you may compile it with a recent version of the gcc compiler (GNU Compiler Collection) or the Intel V7.0 C++ complier. Sorry, for now no VC6 and no VC7 - these are to far away from C++ Standard conformance. But I will eventually try to alter parts of the Wave library to make it compilable with this compilers too - it depends on your response. Wave depends on the Boost library (at least V1.30.2) and the Program Options library from Vladimir Prus (at least rev. 160, recently adopted to Boost, but not included yet) , so please be sure to install these libraries, before trying to recompile Wave. ConclusionDespite the fact, that the Wave library is quite complex and heaviliy uses advanced C++ idioms, as templates and template based metaprogramming, it is farely simple to be used in a broad spectrum of applications. It nicely fits into well known paradigms used over years by the C++ Standard Template Library (STL). The Wave driver program is the only known to me C++ preprocessor, which
therefore it may be an invaluable tool for the development of modern C++ programs. As recent developments like the Boost Preprocessor Library show [1], we will see in the future a lot of applications for advanced preprocessor techniques. But these need a solid base - a Standard conformant preprocessor. As long as the widely available compilers do not fit into these needs, the Wave library may fill this gap. References
History03/25/2003 (Wave V0.9.1)
03/26/2003
04/07/2003 (Wave V0.9.2)
05/16/2003 (Wave V0.9.3)
05/22/2003
06/04/2003
01/05/2004 (Wave V1.0)
| ||||||||||||||||||||||||||||