QOR Compiler Aspect

Matthew Faithfull

4.33/5 (2 votes)

Feb 4, 2013

CPOL

21 min read

20355

116

Open, currently tested on Windows, VC6, VSExpress2005, VSExpress2008, VS2012Desktop, CodeBlocks 12.11 with MinGW GCC 4.7.1, Rad Studio 2010 XE3; Open Suse Linux Netbeans 7.1 with GCC 4.6.2 and 7.2.1 with Clang 3.0.

Download demo project source - 961 KB

TestCmp built with VS2010

Introduction

CompilerQOR is part of the Querysoft Open Runtime, the world's first Aspect Oriented, Open Source, Cross Platform Framework written almost entirely in C++.

CompilerQOR is a small but vital QOR aspect. Its job is to ensure that no code outside of CompilerQOR itself ever has to know or care which C++ compiler is being used to build it. That doesn't mean code can be written without the limitations of compilers in mind, just that the particular limitation being considered is independent of which compiler is used.

The kind of issue CompilerQOR addresses

If you've looked at any substantial amount of open source software you'll have seen many things like this:

#ifdef __COMPILER_A
    ...
#    if( __COMPILER_A_VERSION > 5006 )
    ...
#    else
    ...
#else
#    if ( __COMPILER_VERION < 1400 )
#        ...
#    else

....etc

What's usually going on here is that the original code used some feature of the C++ language or compiler, templates, namespaces, _UBER_FUNKY_BUILTIN_MACRO_ etc., that not everyone's compiler supports. People who wanted to use the code have patched around this for specific versions of specific compilers that do or don't support feature X. Great, now it works for many people but unfortunately it's also now an unreadable mess rammed full of important meta data about different compilers which is not directly related to the original algorithm and which itself cannot be reused. It may work but from a design, maintainence, readability or reuse point of view its a disaster.

If you don't think the above example is bad enough to warrant doing anything about it then take a look at the source of Make the ubiquitous build tool but not for too long or you might go blind and miss out on the rest of this article.

The QOR difference

What CompilerQOR does is allow us to replace the above with:

#if __QCMP_SUPPORTS( FEATURE_X )
...
else
...
#endif

Not only that but all the accumulated wisdom about what each Compiler version does and doesn't do lives in exactly one place. It can be found and it can be reused.

The further advantage is that the above code will now work with a new compiler even one not in existence when it was written. At most it will need an updated CompilerQOR but if feature X is something fairly ordinary that all new compilers are expected to support then CompilerQOR will define it as supported by default requiring little or no changes at all. Anyone who runs a software project will see the potential advantages; reducing costs and complexity, increasing portability, extending product lifetime and encouraging standards conformance.

Hasn't this been done before?

Well of course it has. There are no original lines of code. The Boost libraries do this as does STLSoft and the configuration headers of many other fantastic projects like llvm/clang/libc++ have feature macros to specify support for variations between compilers. CompilerQOR builds on the shoulders of these giants without having to carry them around wherever it goes.

Is that all?

C++ language features are crucial but by no means the only thing that varies between compilers. There are variations in:-

C preprocessors
Intrinsic functions
Limits of code complexity
Switches and pragmas
Options and extensions
Built in types

Then there is the Application Binary Interface, how the built code and data structures are actually laid out in memory, v-table differences, integration with the loader binary format, PE or ELF and injection of code for RunTime Type Information, Exceptions, Security and bootstrapping.

All these things then are in part or in whole the domain of CompilerQOR. As the QOR grows it will become a lot more than a simple set of header files but the goal remains the same. If 'it' varies per compiler 'it' belongs to the CompilerQOR and if 'it' only varies per compiler then 'it' belongs only to the CompilerQOR. Nothing else should depend on which compiler happens to be used. This is what makes CompilerQOR an Aspect as opposed to just a library. The compiler in use affects all the code everywhere in the source tree but we deal with this cross cutting concern in just one place.

Compiler discrimination

Pretty much the first job of CompilerQOR is to work out which compiler is being used to compile it so it can send that compiler down its own happy path in the tree of included files. This is done in the same way its done in the Boost libraries with the input of a few snippets from elsewhere. Each different type of compiler is detected by the predefined preprocessor macros they automatically make available, for example:

#    if defined __CODEGEARC__
#        define __QCMP_COMPILER    __QCMP_CODEGEAR        //CodeGear - must be checked for before Borland
...

The predefinition of __CODEGEAR__ is the definitive indication we need in order to detect an Embarcadero compiler in use. But as the comment implies these compilers also define __BORLANDC__ which is the test macro for the older Borland compilers so we have to check for CodeGear first to be sure which it is. These gotchas aside there's no rocket science here just a chunk of if..else preprocessor logic.

Features

Now we know which compiler is looking at the code we can say which features are available and which are not. We do this by defining all the features we know about in one place "include/CompilerQOR/Common/CompilerFeatures.h" and then by #undef(ining) them in the compiler specific headers where compilers don't support them. We do it this way round for several reasons. There's a definitive list of features in exactly one place. Each compiler header explicitly points out the things which are not up to scratch with that compiler. These are the same things that are likely to cause compatibility issues so we want them explicitly stated in association with each compiler.

To detect if a feature is supported the __QCMP_SUPPORTS( _FEATUREX ) macro is provided for use where you want to say:

 #if __QCMP_SUPPORTS( _FEATUREX )
 ...
 #endif

If you want to do anything fancier with the feature switch such as include it as a condition in QOR_PP_IF( condition, truecase, falsecase ) then you'll need to use __QCMP_FEATURE_TEST( _FEATUREX ) which expands to 0 for unsupported features and 1 for supported features.

This uses officially the nastiest macro hack in the known universe which I reproduce here only to say I don't think I invented this, please don't take out a contract or fatwah on me.

#    define __QCMP_FEATURE_TEST( _X )            __QCMP_FEATURE_TEST2( QOR_PP_CAT( __QCMP_FEATURE_TEST_, _X ), (0) )
#    define __QCMP_FEATURE_TEST_1                0)(1
#    define __QCMP_FEATURE_TEST2( _A, _B )        QOR_PP_SEQ_ELEM( 1, (_A)_B )

Never do that! Don't ask me about it and don't credit me for it. Enough said.

The set of features defined in the sample code is very small. Just enough to allow CompilerQOR itself to compile and for those features to be tested. Many additional features are needed to allow for any and all variations amongst compilers. This work is ongoing research and although it has progressed well beyond what's in the sample code a final definitive list has not been settled. I've reduced the list in the sample code for the sake of simplicity and so that it has a good chance of being forward compatible with future versions, i.e. nothing that's already published there will suddenly disappear in a later version.

Preprocessor

In the sample code with this article is a complete port of the Boost Preprocessor library with a few additions as already noted. Boost is fully documented online and I didn't write their PP library so we'll just deal with whats in the added flow.h here.

QOR_PP_include_IF( condition, file ) provides conditional inclusion based on the macro expansion of condition. If condition exapands to 1 then file is included otherwise an empty file is included effectively negating the inclusion. QOR_PP_include_IF is used like this:-

 #include QOR_PP_include_IF( SOME_CONDITIONAL_MACRO, "include/ConditionalHeader.h" )

Intrinsics

Intrinsic functions are built in functions for which the compiler has an internal copy of the code. Instead of the code being in a library it's in the compiler itself and gets injected into your program during compilation as inline code. Intrinsic functions can be very useful but can also cripple portability between compilers as not every one is available in exactly the same way with each compiler. Change compiler and suddenly that function you were calling no longer exists.

We could just bar the use of intrinsic functions from the QOR altogether and in fact outside of CompilerQOR we do but then there'd be no way to make use of them. CompilerQOR solves this by persuading the Compiler in use to inject all the available intrinsic functions into CompilerQOR and to set preprocessor constants for each one so all other code can find out if a specific intrinsic is available. Even better the intrinsic functions don't pollute the global namespace. Outside of CompilerQOR itself they appear as member functions of the CCompiler class. In the sample code this is implemented for Microsoft VC++ compilers as a proof of concept. This is how it works for the intrinsic form of memcpy:

After compiler discrimination causes the preprocessor to include the header for a Microsoft Compiler

__QCMP_BUILTINS_HEADER gets defined as the name of a file containing declarations for built-in functions, e.g., #define __QCMP_BUILTINS_HEADER "CompilerQOR/MSVC/VC6/Builtins.h"

__QCMP_UNBUILTINS_HEADER gets defined as the name of a file containing instructions to the compiler not to inject each built-in function, e.g., #define __QCMP_UNBUILTINS_HEADER "CompilerQOR/MSVC/VC6/UnBuiltins.h"

and finally __QCMP_BUILTINS_INC gets defined as the name of a file containing CCompiler member functions each of which is a wrapper for an intrinsic function, e.g., #define __QCMP_BUILTINS_INC "ComilerQOR/MSVC/VC6/Builtins.inl"

Later the generic Compiler.h header class #includes __QCMP_BUILTINS_HEADER within an extern "C" section like this if the CompilerQOR is being built.

extern "C"
{
    #include __QCMP_BUILTINS_HEADER
}

which expands to:

extern "C"
{
    void* memcpy( void* dest, const void* src, size_t count );
    ...
}

The additional condition which means this is only included if CompilerQOR itself is being built is so that other libraries including this header never see the global namespace declaration of memcpy.

So now we have declared the global namespace memcpy function but not defined it. Then within the CCompiler class declaration itself the same header is included again. This time without the extern "C".

class CCompiler : public CCompilerBase
{
    ...
#    include __QCMP_BUILTINS_HEADER
    ...
}

This creates a member function declaration matching each of the available intrinsic functions:

class CCompiler : public CCompilerBase
{
    ...
    void* memcpy( void* dest, const void* src, size_t count );
    ...
}

In the main CompilerQOR.cpp file, the CCompiler member functions are then implemented by including __QCMP_BUILTINS_INC like this:

//--------------------------------------------------------------------------------
namespace nsCompiler
{
    #ifdef __QCMP_BUILTINS_INC
    #    include __QCMP_BUILTINS_INC
    #endif
    ...

which expands to:

//--------------------------------------------------------------------------------
namespace nsCompiler
{
    //--------------------------------------------------------------------------------
#pragma intrinsic(memcpy)
    //--------------------------------------------------------------------------------
    void* CCompiler::memcpy( void* dest, const void* src, size_t count )
    {
        return ::memcpy( dest, src, count );
    }
    ...

The pragma instructs the compiler to use the intrinsic version of memcpy when it comes across otherwise undefined references to it. We haven't defined it so the CCompiler::memcpy function gets an injected copy of the intrinsic

After this the final preprocessor defined header is included at global scope

#include #ifdef __QCMP_UNBUILTINS_HEADER
#    include __QCMP_UNBUILTINS_HEADER
#endif

which expands to:

...
#pragma function(memcpy)
...

This instructs the compiler to stop using the intrinsic form of memcpy and it goes back to being an undefined function for the rest of the compilation unit or until the C library defines it again but that's for another article.

At the end of all this prattling about then we end up with a CCompiler class that contains member functions that are just calls to each of the compilers intrinsic functions. The only external interface is that of the exported CCompiler class.

There's one more thing we have to do to make use of this later and that's to define a macro for each function so we know its available. This is done in the __QCMP_BUILTINS_HEADER file giving us: #define __QCMP_DECLS_MEMCMP 1 which can be tested for later on.

Limits of code complexity

Compilers vary in what they can cope with in terms of the lengths of names and number of various types of definitions they can hold onto at once. They also vary in how many macro expansion levels the preprocessor can handle and how complex template expansions can get before the dreaded 'Internal Compiler Error' ruins your afternoon. Some of these limits are checked in the preprocessor library and as the QOR grows others will no doubt be discovered.

Switches and pragmas

Many compilers support the output of messages from within the code during compilation. This can help to track which files are being included, what compilation choices are being made according to the configuration and where and why things go wrong. CompilerQOR supports diagnostic messages which can be turned on and off for each compilation unit. That's one .cpp file and all the headers it includes. This works for recent Microsoft, GCC, and Clang compilers including MinGW GCC, other compilers will just be silent as far as custom messages are concerned. Here's how it works:

#ifndef NDEBUG                                //Debug build
#    define __QCMP_REPORTCONIG            1   //Report configuration items during compilation where supported
#endif

Inserting these lines as the first lines in a .cpp file, before any includes, means that when the preprocessor reaches lines like this:

__QCMP_MESSAGE( "Compiler runtime type information enabled." )

You should see 'Compiler runtime type information enabled' coming out on your build console. This will of course depend to some extent on which IDE you use. It works on .Net era versions of Visual Studio and on recent CodeBlocks and Netbeans IDEs.

the __QCMP_MESSAGE macro just like all the others discussed here and those not hidden in 'details' headers within the preprocessor library is fine to use in any code that includes CompilerQOR.h.

Options and extensions

There are a number of extensions which are often supported by compilers even though they were not or are not specified in the C++ language. Among these are RunTime Type Information (RTTI) and C++ exceptions which I gather are now in the standard in some form but of course have legacy implementations all over the place. CompilerQOR enables other libraries to make use of these extensions portably by detecting and reporting their availability. Each extension is specified by a definition like this:

#define RunTimeTypeInformation_QCMPSUPPORTED 1. The mixed case names distinguish extensions from language features

To test for an extension use the name without _QCMPSUPPORTED on the end as a parameter to __QCMP_EXTENSION( _X )

For example: __QCMP_EXTENSION( RunTimeTypeInformation ) will expand to 0 if the extension is unavailable and 1 if it is available.

There are also options that can be predefined for some compiler preprocessors that change the mode of compilation. These generally have to be set outside the code itself in the IDE or Makefile. With CompilerQOR you can choose to do that or in most cases you can define everything in a single Configuration header file which works just as well. The configuration for building CompilerQOR and anything that uses it works like this.

If __QOR_CONFIG_HEADER is defined outside the code in the IDE or Makefile to the name of a file then that file gets included to control the configuration. If it isn't defined then the "DefaultConfig.h" you can see in the sample code is used instead. The following things can be configured:

__QCMP_REPORTCONIG The default value for this macro which weve seen already can be set to turn on or off default output during compilation globally.
__QCMP_REPORTDEFECITS Works similarly to __QCMP_REPORTCONIG but is specifically for TODO: markers in the code
__QOR_FUNCTION_CONTEXT_TRACKING This is a setting for the CodeQOR library which will turn up in article 3 of this series
__QOR_CPP_EXCEPTIONS This enables and disables those pesky exceptions. Just because your compiler supports them doesn't mean you always want to use them. This is not supported in the sample code.
__QOR_ERROR_SYSTEM This is also for the CodeQOR library that will turn up later.
__QOR_PERFORMANCE A number between 0 and 10 inclusive which gets used in a few places to determine how much checking of things we should hang around doing. At 0 you're almost into lint or valgrind territory and at 10 it's carry on regardless as gung-ho as you like.
__QOR_UNICODE This is important for Microsoft compilers which have distinct Unicode and Multibyte modes. Define this to 1 for UNICODE builds on Microsoft compilers or any compiler that you find supports Unicode builds in the same way.
__QOR_PARAMETER_CHECKING_ This is yet another setting for the CodeQOR library we haven't seen yet.

You're welcome to experiment with the configuration and tell me if any of the combinations don't work but it will have little or no effect on the sample code for this article. One thing that will is adding a definition for __QCMP_COMPILER to the configuration. This will override automatic compiler discrimination and 'pretend' a different compiler is in use. There are rare cases where this might be useful but please don't expect it to work as a general rule. The values for __QCMP_COMPILER can be found in the "include/CompilerQOR/Common/Compilers.h" header file.

Built-in types

We're all familiar with the fundamental types of C++, int, char, volatile unsigned long long? However the language itself is quite frighteningly vague about exactly what an int is let alone a long double. In order to be able to move code easily between compilers we need a set of types we can rely on to always be the same size. We can't guarantee of course that the bytes will always be stored the same way round as that's down to the hardware but we'll leave that issue to hardware abstraction for now. It only really starts to hurt when we want to share binary files between systems with different architectures.

CompilerQOR defines a set of types within the CCompiler class that must be available in some form from each supported compiler. If the compiler doesn't natively provide them then we need to fake them with typedefs so that client code can rely on the same types always being present. Each base type has const and volatile qualified variations and some have signed and unsigned. Here's the set for char:

typedef signed char                mxc_signed_char;
typedef const signed char        mxc_c_signed_char;
typedef volatile signed char    mxc_v_signed_char;
typedef unsigned char            mxc_unsigned_char;
typedef const unsigned char        mxc_c_unsigned_char;
typedef volatile unsigned char    mxc_v_unsigned_char;

and here we fake the wchar_t type for a compiler that doesn't have it built in:

typedef mxc_unsigned_short            mxc_wchar_t;
typedef mxc_c_unsigned_short        mxc_c_wchar_t;
typedef mxc_v_unsigned_short        mxc_v_wchar_t;

These types are pulled back into the global namespace in "CompilerTypes.h" which also acts as a build time check that the CCompiler class has defined them all

typedef nsCompiler::CCompiler::mxc_unsigned__int64            Cmp_unsigned__int64;
typedef nsCompiler::CCompiler::mxc_c_unsigned__int64        Cmp_C_unsigned__int64;
typedef nsCompiler::CCompiler::mxc_v_unsigned__int64        Cmp_V_unsigned__int64;

All these types end up with a Cmp_qualifier_type form. Above is an example of a sized type where the second underscore is doubled and a bit size is appended. While we can live with variations in the size of a long double these sized types really must be reliably exactly what they say they are.

CompilerQOR has sized types for 8, 16, 32 and 64 bit signed an unsigned integers and their const and volatile variants.

Types that vary with the word size of the architecture are also useful and ComilerQOR defines Cmp_int_ptr and Cmp_uint_ptr as integer types exactly the size of a pointer on the current architecture. Finally Cmp__int3264 is always 32bits on a 32bit machine and 64bits on a 64bit machine even if some weird addressing limitation or extension changes the _ptr types and byte is always exactly 8 unsigned bits. ( I always thought it was a ridiculous shortfall that 'byte' was not a fundamental type so I sneaked it in.)

The QOR uses the ordinary C++ types for most purposes but wherever you need to be sure that a 64bit type will be available or that a variable will be large enough to hold an address the Cmp_ types come in handy. The fact that they all have single token contiguous names even Cmp_V_unsigned_long_long can also be useful if type names need to go through recursive preprocessor macro expansion where volatile unsigned long long might get interpreted as 4 parameters rather than 1.

And then some

That leaves us with ABI's, v-table access, binary image formats, RTTI, Exceptions, SEH, Security and bootstrapping which is still a lot. These things are generally taken care of by support libraries like libsupc++ or MSVCRT and for the moment the sample code with this article will have to rely on them as well. In later articles we'll see why that isn't always possible or desirable and how to actually implement a number of these things ourselves so that the compiler doesn't have to. If you've ever fancied walking through the steps of a dynamic cast in the debugger then you'll have to catch the Windows Compiler Support article when it eventually emerges.

A walk in the park

So that's what CompilerQOR covers and what it doesn't. Now lets get into the details and walk through adding support for a completely new compiler.

Step 1:

Search or grep the code for:

Note: Add new compiler support here

This will give you a list of places where essential edits are required.

Add a __QCMP_MYCOMPILER definition, a __QCMP_COMPILERHEADER path definition to reference your MyCompiler.h file and add MyCompiler.h into whichever IDE project or Makefile you're using to build CompilerQOR.

Step 2:

Set up all the specific definitions that will only be included if the new compiler is in use.

These all go inside MyComiler.h or files included from there. To work out what needs to be set either refer to the header of the most similar compiler already supported as a starting point or copy the Template.h file provided with the sample code to get you started. If you're an expert on your compiler then in a few minutes you'll have everything in place. If not then in a few minutes you'll have a lot of questions like, "How does my compiler manage warnings?" and what's the correct equivalent of __declspec(naked)? I probably can't be of much help, if I knew I may have added support for your compiler myself but the members of CodeProject may be of some assistance along with manuals and Google. At this stage leave all the features turned on unless you know for sure they aren't supported. Unsupported features will be picked up automatically in Step 4.

Step 3:

Build CompilerQOR as a static library with your new compiler. The chances are you'll get errors the first time. The crucial thing is where those errors occur. If you get an error in your new "MyCompiler.h file or in generic code that is included after "MyCompiler.h" then you have a fix to make. If you get an error in code that's reached by the build before "MyCompiler.h is included then the chances are I have a fix to make. Some assumption I've made about what is 'generic' code that will be supported by all compilers was wrong for your case. Please let me know as these things are generally fixable.

Step 4:

The sample code provides projects to build a static library, StaticCompilerQOR and an executable TestCmp which links in the static library. TestCmp contains a series of build time and runtime tests for the features specified in CompilerQOR for the compiler in use. Running TestCmp will output the results of the runtime tests. If you're running it then the compile time tests have passed. Press <return> a few times to drive it through to the end or it will wait for you indefinitely.

Build and run TempCmp just as it is. If it fails to build then the failure should indicate what needs to be added or changed in "MyCompiler.h" to fix it. You might need to #undef some features at this point. Once it's building we're almost there, run it up and check for any failed tests. The type sizes we can't do anything much about at the moment but any failures of the other tests indicate features that need a #undef

Now you have basic support for yet another Compiler and any code written to make full use of CompilerQOR feature and extension checks has a good chance of working with it

Given that moving a reasonable sized source tree even from one version of the same compiler to another can take days the savings easily justify the time invested

Note on the sample projects

Due to the number of compilers and build systems supported only the Debug build configurations have been set up. Out of the box no Release build is likely to work. If you want one you'll need to carefully set the options in the Release configuration after examining the relevant Debug configuration.

Linux builds will report failures when checking the type sizes. This is not an error in itself but it does point up a pretty hideous inconsistency between Linux and Windows compilers even when both are GCC based. At the moment we have no SystemQOR library to handle operating system difference for us so TestCmp can only have one set of 'correct' values to check against. Suggestions on 'permanent' solutions to this problem would be welcome.

In conclusion

This initial version of CompilerQOR is clearly just a start and has a long way to go to fulfill its full potential. I don't own or have access to all the worlds variety of C++ compilers so that's where you come in. If your compiler is under-supported, unsupported or even completely unrecognized by CompilerQOR then the QOR needs your input. You make the QOR better and it makes everyone's life better. That's the way open source is supposed to work. The code that accompanies this article is also online at Source Forge where you can contribute to the QOR.

Now we've abstracted the compiler we can feel pretty good but what about all the differences between different machines that we haven't abstracted. The world is full of a mixture of 32bit and 64bit architectures and by the time that goes away someone will have made the jump to 128bit. We're certainly not walking on water yet if we can't take those differences in our stride. Then there's MMX and SSE and ... It looks like we're going to need an ArchQOR.

Acknowledgements

Due to a lot of compiler products being referenced here a lot of proprietary names are mentioned in the article and source many of which are Trademarks:

Microsoft, Windows, Visual C++ and Visual Studio are trademarks of Microsoft.

Embarcadero and CodeGear are trademarks of Embarcadero.

All other trademarks and trade names mentioned are acknowledged as the property of their respective owners who are in no way responsible for any of the content of this article or the associated source code.

I'd especially like to thank the regulars and occasionals in the CodeProject lounge who helped me prioritize which compilers to support and provided the encouragement necessary to do the same thing seven times in seven different IDEs only to realize it would have to be done again. Doh!

History

Initial version - 03/02/2013