Click here to Skip to main content
Click here to Skip to main content

Compiling C++ code at runtime

By , 8 Oct 2008
 

Introduction

The C programming-language draws a line between compile-time and run-time: you compile your program to convert it into machine-code, which can then be run. C++ crosses this line via template-metaprogramming techniques that use the compiler to “run” meta-programs. Dynamic programming languages (e.g., Python) cross the line in the other direction by allowing runtime generation of functions (“higher level functions”).

The following is a proof-of-concept showing that higher-level functions can be implemented in C++ using the LLVM Compiler Infrastructure, and how they can be useful for runtime code optimization.

Background

Static Code Optimization

A C++ compiler can do quite sophisticated code-optimizations. However, how much optimization can be done depends on what the compiler knows of the code.

For example, one trivial optimization is “strength reduction”. Generally, when faced with a code such as:

int divide(int x, int z) {
   return x/z;
}

the compiler doesn’t have a choice, and will use an expensive division instruction. However, if more is known on x and z, there are many additional optimizations that can be performed. For example, if we know z at compile time, we can embed the known value in a function (or specialize a template function):

int divide8(int x) {
   return x/8;
}

Now, the compiler can optimize the divide away, replacing it with a cheap right shift:

int divide8_opt(int x) {
   return x >> 3;
}

This and similar optimizations can increase the application performance significantly.

Target Specific Optimization

Additional optimizations depend on the target machine: if you know in advance the application will run on a Core 2 machine, you can use SSE 4.1 instructions to implement additional optimizations. However, in most cases, applications are expected to run on a wide variety of machines, and a “least-common-denominator” approach is used, emitting Pentium-compatible executables, losing potential performance gains.

Dynamic Code Optimization

If we could defer part of the compilation process to runtime, more optimization opportunities are available.

First, we could generate code optimized for the actual machine the application is running on – use extensions such as SSE4.1 only when running on a Core 2 machine.

Second, we could create code that depends on values not available at compile time. For example, a packet filter may scan network traffic for specific patterns which are only known at runtime. However, these patterns are relatively static, and change infrequently (see this for an example).

The LLVM Compiler Infrastructure

LLVM is a high-quality, platform independent infrastructure for compilation. It supports compilation to an intermediate “bitcode”, and an efficient Just-In-Time (JIT) compilation and execution. For a good introduction to both LLVM and writing compilers, see the excellent LLVM tutorials. LLVM provides tools for generating bitcode, which it can then optimize, compile, and execute.

Using the Code

DynCompiler

DynCompiler is a proof-of-concept (read: don’t expect it to work for more than toy problems, have a nice syntax, or do anything useful) of a DSEL for dynamic code compilation, which uses LLVM to do the real work. With DynCompiler, you can create a “higher-level function” – or a function for creating other functions.

For example, the following function can create a specific quadratic polynomial (ax2+bx+c) for given coefficients a, b, and c:

typedef int (*FType)(int);
FType build_quad(int a, int b, int c) {
     DEF_FUNC(quad) RETURNS_INT
     ARG_INT(x);
  BEGIN
    DEF_INT(tmp);
    tmp = a*x*x + b*x + c;
    RETURN(tmp);
  END
  return (FType)FPtr;
}

Note that build_quad() returns a function – it is not the quadratic function itself (the same way that function templates are not “concrete” functions). To create an actual function:

FType f1 = build_quad(1, 2, 1); // f1(x) := x^2+2*x+1

Which can now be used as any other function:

for(int x = 0; x < 10; x++) {
    std::cout << "f1(" << x << ") = " << f1(x) << std::endl;
}

Syntax

DynCompiler has an ugly syntax – the result of the limitations of the preprocessor and laziness. A function generator has a name and a return type (only “int” and “double” are supported):

DEF_FUNC(name) RETURNS_INT

or:

DEF_FUNC(name) RETURNS_DOUBLE

for a function returning a double. Arguments for the resulting function are provided by:

ARG_INT(x); // integer

or

ARG_DOUBLE(x); // double

The actual function code starts with a BEGIN:

BEGIN

Local variables can be defined with DEF_INT and DEF_DOUBLE:

DEF_INT(tmp);

You can then use these variables (almost) normally:

tmp = a*x+b;

Note that the code is not evaluated at this point except for “normal” C++ variables like a and b. Therefore, at the time this line executes a = 3, and b = 2, and the above code will be evaluated as:

tmp = 3*x+2;

Note that unused variables, or variables used before initialization, will generate an error. Returning a value from the function is done with:

RETURN_INT(expr);

or

RETURN_DOUBLE(expr);

Note that a function must return a value. The function block ends with:

END

The basic control flow is provided by IF and WHILE:

IF(x > 0)
     IF(y > 0)
        z = x*y;
     IFEND
  ELSE
     z = 0
  IFEND 

  WHILE(z > 0)
     z -= x;
  WHILEND

In addition, a PRINT(expr) can be used to print to standard output:

PRINT(i);

Finally, after END, FPtr will point to the newly created function. You will need to cast the pointer to the actual function type though:

f1 = (FType)FPtr;

Running the Code

You will need to download and build LLVM (check this link for Visual Studio specific instructions). The DynCompiler code itself requires TR1 support in addition – e.g., Visual Studio 2008 with SP1.

Status

DynCompiler is a proof-of-concept, and should not be taken seriously.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

cppnow
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralConfusing...memberMaximilien8 Oct '08 - 7:59 
Really don't know what this article is about...

 

GeneralRe: Confusing...memberKochise8 Oct '08 - 9:16 
Nope, only coders understand such an article, especialy after having read it completely Smile | :)
 
Kochise
 
In Code we trust !

GeneralWhy it is useful to build functions at runtimememberwtwhite13 Oct '08 - 19:19 
What the author's system does is enable a C++ program to "make new functions" while it is running (to a limited extent). Normally you can't do this with C++ -- you are stuck with the functions you wrote yourself before you compiled the program.
 
Why would you want to do this? One reason, as the author suggested, is performance -- at runtime you know things that you might not be able to know at compile time, so some sort of "Just in time compilation" may be able to generate much more efficient code (e.g. a fast bit shift instead of a slow divide when dividing by 8).
 
But the other reason, which I actually believe is more important, is that having a language where functions can build and return new functions enables some powerful coding techniques that hugely simplify certain types of tasks. I now use this technique in Perl quite a bit.
 
Quick example: I have a little self-made logging library. The initlogging() function takes one parameter, which is a function of one argument that is expected to output its argument to some log device (e.g. STDERR, a log file, etc.). This enables logging to be generic -- any type of log device can be used, just by creating a function that writes to that device and supplying that to initlogging(). Since a common type of logging is to a named file, I also provide a function, make_file_logger() that takes the name of a log file and returns a function that writes to that file. So all you need to do is run initlogging(make_file_logger('mylog.txt')); at the start of the program.
 
What if you want to log to more than one place? Just use the helper function make_multi_logger(), which takes a list of logging functions and returns a single function that calls them all in sequence. So then all you need to do is call initlogging(make_multi_logger(stderr_logger, make_file_logger('mylog.txt')));, and everything Just Works!
 
For myself, it took a while to get used to this way of thinking, but once you do, you will enjoy the new expressiveness!
 
WTJW
GeneralRe: Why it is useful to build functions at runtimemembercppnow13 Oct '08 - 21:48 
Note that in your logging example you know where to log in compile time. You can solve this with standard C++ easily
E.g.:
 
class log_target {
public:
   virtual void log(const std::string&) = 0;
};
...
 
class logger {
public:
  ...
  void write_log(const std::string& s) {
     log_target_->log(s); // delegate
  }
 
private:
  log_target* log_target_;
};
 
Note that if you want to avoid the virtual function, templates can provide a good alternative.
GeneralRe: Why it is useful to build functions at runtimememberwtwhite13 Oct '08 - 23:08 
You're right, when I think about it you can accomplish that and many similar things in C++ by using classes containing a virtual method "slot", and treating a pointer or reference to a base class as a "function value object". Or as you said, you can use templates instead of polymorphism if every call destination is known at compile time. Class data members can simulate closures, and overriding operator()() makes for pretty call syntax.
 
Actually, I'd be interested to know: is there really any difference in principle between a first-class function value (e.g. a Perl coderef) and an instance of a C++ class that is derived from a base class that declares (say) operator()() virtual? (Ignoring the fact that C++ has to worry about matching call signatures, and memory management.) Is there anything you could do with the former that you could not do with the latter? I sort of instinctively feel that Perl's coderefs are more powerful and general, but I can't think of a material distinction offhand... Big Grin | :-D
 
WTJW

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 8 Oct 2008
Article Copyright 2008 by cppnow
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid