![]() |
Languages »
C / C++ Language »
General
Intermediate
A gentle introduction to Template Metaprogramming with C++By moliateAbusing your compiler for extremely early binding |
VC6Win2K, WinXP, Dev
|
|
Advanced Search |
|
|
|
||||||||||||||||
A while ago I was working on a program that used a large lookup table to do some computations. The contents of the table depended on a large number of parameters that I had to tweak to get optimal performance from my code. And every time I changed one parameter the whole table had to be recalculated...
I had written a function that dumped the content of the table to standard output. That way I could compile my program with the new parameters set, cut-and-paste the table from screen and into my source code and finally recompile the whole project. I didn't mind doing that - the first twenty times. After that I figured there had to be a better way. There was. Enter Template Metaprogramming.
int bits_set(unsigned char byte) {int count = 0; for (int i = 0; i < 8; i++) if ( (0x1L << i) & byte ) count++; return count; }In cases where the byte is known at compile time this can also be done by the compiler, using TMP:
template< unsigned char byte > class BITS_SET { public: enum { B0 = (byte & 0x01) ? 1:0, B1 = (byte & 0x02) ? 1:0, B2 = (byte & 0x04) ? 1:0, B3 = (byte & 0x08) ? 1:0, B4 = (byte & 0x10) ? 1:0, B5 = (byte & 0x20) ? 1:0, B6 = (byte & 0x40) ? 1:0, B7 = (byte & 0x80) ? 1:0 }; public: enum{RESULT = B0+B1+B2+B3+B4+B5+B6+B7}; };
I have used an enum for the temporary variables as well as for the result since they are easier to use and enumerators have the type of const int. Another way would be to use static const int:s in the class instead.
You can now use BITS_SET<15>::RESULT and get the constant 4 in your code. In this case the compiler evaluate the line enum{RESULT = B0+B1+B2+B3+B4+B5+B6+B7}; to enum{RESULT = 1+1+1+1+0+0+0+0}; and finally to enum{RESULT = 4};.
template< int i > class FACTOR{ public: enum {RESULT = i * FACTOR<I-1>::RESULT}; }; class FACTOR< 1 >{ public: enum {RESULT = 1}; };If we for example write this:
int j = FACTOR< 5 >::RESULT;
somewhere in our code the compiler will generate something like the following line of assembler code:
; int j = FACTOR< 5 >::RESULT;
mov DWORD PTR _j$[ebp], 120 ; 00000078H - a constant value!
How does this work? As we instantiate FACTOR<5> the definition of this class depends on FACTOR<4>, which in turn depend on FACTOR<3> and so on. The compiler needs to create all these classes until the template specialization FACTOR<1> is reached. This means the entire recursion is done by the compiler, while the final program just contain a constant.
Template metaprograms can generate useful code when interpreted by the compiler, for example a massively inlined algorithm that has its loops unrolled. The result is usually a large speed increase in the application.
For example, look at the following code that calculates the sum of the numbers 1..1000:
int sum = 0; for (int i = 1 ; i <= 1000; i++) sum += i;
We are actually performing 2000 additions, rather than 1000 (as we have to increment i by one for each loop). In addition we perform a thousand test operations on the variable i. Another way would be to write the code the following way:
int sum = 0; sum += 1; sum += 2; ... sum += 1000;
This is the way a Template Metaprogram would expand a loop. Now we perform exactly a thousand additions, but this method also has a prize. The code size increase, meaning that we theoretically could take a performance hit by increasing the number of page faults. In practice, though, code is often invoked multiple times and already loaded in cache.
Loop unrolling is easily defined using recursive templates, similar to calculating a value:
template< int i > class LOOP{ public: static inline void EXEC(){ cout << "A-" << i << " "; LOOP< i-1 >::EXEC(); cout << "B-" << i << " "; } }; class LOOP< 0 >{ public: static inline void EXEC(){ cout << "A-" << i; cout << "\n"; cout << "B-" << i; } };
The output of LOOP< 8 >::EXEC() is shown below:
A-8 A-7 A-6 A-5 A-4 A-3 A-2 A-1 A-0
B-0 B-1 B-2 B-3 B-4 B-5 B-6 B-7 B-8
Again, the thing to notice is that there is no loop in the resulting binary code. The loop unrolls itself to produce code like:
cout << "A-" << 8 << " "; cout << "A-" << 7 << " "; ... cout << "A-" << 0; cout << "\n"; cout << "B-" << 0; ... cout << "B-" << 7 << " "; cout << "B-" << 8 << " ";
An unrelated, but interesting thing can be found in class LOOP< 0 >. Look at how LOOP< 0 >::EXEC() uses i. This identifier has been declared in the template LOOP< int i >, but is still accessible from the "special case" LOOP< 0 >. I don't know if this is standard C++ behavior, however.
Beside loops other statements can be constructed:
template< bool Condition > class IF { public: static inline void EXEC(){ cout << "Statement is true"; } }; class IF< false > { public: static inline void EXEC(){ cout << "Statement is false"; } };
template< int _case > class SWITCH { public: static inline void EXEC(){ cout << " SWITCH - default "; } }; class SWITCH< 1 > { public: static inline void EXEC(){ cout << " SWITCH - 1 "; } }; class SWITCH< 2 > { public: static inline void EXEC(){ cout << " SWITCH - 2 "; } }; ...
Example of usage of the two classes:
SWITCH< 2 > myTwoSwitch; // store for delayed execution myTwoSwitch.EXEC(); IF< false >::EXEC(); myTwoSwitch.EXEC();
The output will be: " SWITCH - 2 Statement is false SWITCH - 2 "
template< bool Condition, class THEN, class ELSE > struct IF { template< bool Condition > struct selector {typedef THEN SELECT_CLASS;}; struct selector< false > {typedef ELSE SELECT_CLASS;}; typedef selector< Condition >::SELECT_CLASS RESULT; };
Example of usage:
struct THEN { static int func() {cout << "Inside THEN"; return 42; } }; struct ELSE { static int func() {cout << "Inside ELSE"; return 0; } }; int main(int argc, char* argv[]) { int result = IF< 4 == sizeof(int), THEN, ELSE >::RESULT::func(); cout << " - returning: " << result; }
On 32-bit architectures this will print "Inside THEN - returning: 42" to standard output. Note that if func() was not defined inside ELSE this would be a simple compile-time assert breaking compilation on 4 != sizeof(int).
Included as a sample is a small class that does generic CRC (Cyclic Redundancy Codes - a set of simple hash algorithms) calculations. The class uses a set of parameters that are #define:d at the top of the header. The main reason I chose CRC for this article is that the CRC algorithms usually use a lookup table based on a set of parameters, making the example somewhat similar to my original problem.
The class generate a lookup table with 256 entries at compile time. The implementation is pretty straightforward, and I hope that the comments in the source is enough to explain usage of the class. In some cases I have used macros where TMP would have been a possible solution. The reason for this is readability, an important factor when choosing which technique to use.
If you want a further explanation of CRC algorithms, I suggest reading Ross N. Williams "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS". I have included a verbatim copy of the text in the sample.
One thing to remember when compiling this is that the source is using a lot of compiler heap memory. I had to increase the memory allocation limit by using the compiler option '/Zm400'. This is one drawback of TMP - it really pushes the compiler to the limit.
The compiler doesn't like being abused as described above. Not a bit. And it will put up a struggle. Warnings and errors will range from cryptic to C1001 - INTERNAL COMPILER ERROR. And you can't debug a Metatemplate program like a runtime program.
For those reasons using well defined coding conventions is more important with TMP than with other programming techniques. Below I give some rules that I've found useful.
As template metaprograms are somewhat similar to macros I prefer to give all TMP classes uppercase names. Also try to make the name as descriptive as possible. The main reason for this is that public variables/functions usually have non-descriptive names (see below). A TMP class is the one defining the operation, not the member functions/variables.
Also try to limit a TMP class to a single operation. Template Metaprogramming is challenging enough without trying to generalize the class to support multiple operations. Often this will only result in code bloat.
RESULT for templates that calculate a value
EXEC() for loop unrolling/function simplification.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 6 Mar 2003 Editor: Chris Maunder |
Copyright 2003 by moliate Everything else Copyright © CodeProject, 1999-2009 Web19 | Advertise on the Code Project |