|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Contents
IntroductionA heterogeneous container is a container that can store elements of different types. For strongly typed languages like C++, such kind of container isn't a natural or built-in feature. Many solutions exist though, to simulate this heterogeneous property, but they often involve memory space or runtime speed trade-offs. This article presents a fixed-size heterogeneous container, named Existing SolutionsThis chapter lists the most known and popular solutions to implement a heterogeneous container, and their main drawbacks. Classical PolymorphismIn the classical polymorphism solution, the container holds pointers to a base class from which several classes are derived. The heterogeneous property is then achieved through the dynamic type of each element of the collection. Limitations:
Union-like ElementsA container of
Union-like implementations, such as discriminated unions, made their appearance to overcome some of those limitations. One example of such implementation is
TupleA tuple is a finite collection of elements. In C++, the implementation of a tuple is a fixed-size container that can hold elements of any type. In such an implementation, element access is resolved statically resulting in no runtime overhead. Limitations:
ConclusionWe reviewed some popular implementations of a heterogeneous container. This list isn't exhaustive, there are other solutions (
The tek::record ImplementationRequired Technical BackgroundBefore explaining the implementation of the heterogeneous container, the concept of TypelistA template < typename T, typename U >
struct typelist {
typedef T element_type;
typedef U next_type;
};
To define a variable number of different types, the typedef
typelist< int,
typelist < char,
typelist < MyClass,
end_of_list > > > MyTypelist;
The above declaration defines a list of three types: MyTypelist::element_type // int
MyTypelist::next_type::element_type // char
MyTypelist::next_type::next_type::element_type // MyClass
However, to be useful and convenient, accessing a type must be done in a generic way, using an index. The implementation of such access is done recursively: going through the template < class T, int INDEX >
struct get_type {
typedef typename get_type < typename T::next_type,
INDEX-1 >::type type;
};
template < class T >
struct get_type < T, 0 > {
typedef typename T::element_type type;
};
With the recursive get_type < MyTypelist, 0 >::type;
// generates: MyTypelist::element_type
get_type < MyTypelist, 2 >::type;
// generates: MyTypelist::next_type::next_type::element_type
We now know enough about tek::record Static PartAs we saw in the Existing Solutions chapter, tuples have some performance advantages over the other heterogeneous containers. They don't waste memory space 1, and accessing an element is very fast. To achieve such performances, the classical implementation of a tuple relies on a template structure that follows the Example: template < typename T, typename U >
struct tuple {
T element;
U next;
typedef T element_type;
typedef U next_type;
};
An arbitrary number of types can be specified to the structure: tuple< int,
tuple < std::string,
tuple < MyClass,
end_of_list > > > >
Such declaration will be roughly translated to the following (without including the struct tuple<...> {
int element; //first element
struct tuple<...> {
std::string element; // second element
struct tuple<...> {
MyClass element; // third element
end_of_list next;
} next;
} next;
};
A tuple defines two kinds of indexed access, both of which are performed statically:
Indexed access to an element type in the list is identical to the template< int INDEX, typename return_type >
struct get_element {
template< class T >
static return_type Do( T& tup )
{
return get_element< INDEX-1, return_type >::Do( tup.next );
}
};
template< typename return_type >
struct get_element< 0, return_type > {
template< class T >
static return_type Do( T& tup )
{
return tup.element;
}
};
Modern compilers are able to inline the recursive function, as if the value was directly accessed. The tek::record Dynamic PartIntroductionThe implementation of a tuple is a good base for a high performance, fixed size heterogeneous container. However, a tuple doesn't provide dynamic access to its elements. That's quite logical: what code an expression like
The functor acts as a visitor 2 and should have an overloaded function for each element type. Example: myTuple.apply( Print(),index );
Functor definition: struct Print {
void operator()( int element )
{
std::cout << "int : " << element << endl;
}
void operator()( double element )
{
std::cout << "double : " << element << endl;
}
void operator()( char element )
{
std::cout << "char : " << element << endl;
}
};
The ImplementationThe implementation of such dynamic access must generate every possible code for each type and select the right one according to the runtime value of the index. The selection can be simply done using an Example using template < class T >
void apply ( int index, T functor )
{
switch ( index ) {
case 0 : functor ( tuple[0] ); break; // tuple[0] is an int
case 1 : functor ( tuple[1] ); break; // tuple[1] is a std::string
...
}
}
Note: This implementation is no more than a kind of type switching and, in the OO world, type switching is considered evil because it introduces coupling: each time you add a new type, you must modify the if (index == 0)
//code for tuple[0]
else
if (index == 1)
//code for tuple[1]
else
if (index == 2)
// code for tuple[2]
...
Finding the right index this way induces a complexity of O(n). A search algorithm could be generated instead to reduce the complexity but the performance still depends on the number of types handled by the heterogeneous container. The alternative solution is the use of the For example, suppose we want to test a range of values from 0 to 3. We can test all the values within a switch (x) {
case 0 : ...
case 1 : ...
case 2 : ...
case 3 : ...
}
The above JUMP ADDR_TABLE[ x ];
The performance of a In-depth Optimizations DiscussionsBounds Checking EliminationSince we are in the context of a fixed-size container, out-of-boundary checking can be sometimes superfluous and should be left to the user (see the switch (x) {
case 0 : ...
case 1 : ...
case 2 : ...
default : __assume(0); // x = 0 , 1 or 2, nothing else
}
I don't know about other compilers supporting these kind of keywords. At the time of writing, GCC doesn't support such compile-time assumptions. So Pseudo-variadic Behavior ImplicationsVariadic templates isn't supported in the current version of the C++ language, though it is a desirable feature in our case. Users should be able to write with no hassle: record < int, char > a; // 2 elements
record < int, char, double, std::string > b; // 4 elements
instead of using the recursive way: record < int, < record char < record double , etc.
To simulate variadic templates, default template parameters are used. The default template argument is template < class T0 = null_type,
class T1 = null_type,
class T2 = null_type,
class T3 = null_type >
struct record {};
The user can then specify any number of parameters upto a certain limit, 4 in the example. This limit is ultimately parameterizable with a macro (see chapter Usage). Internally, all those template parameters are converted to a tuple: tuple< T0, tuple < T1, tuple < T2, T3 > > >
Using such kind of limit leads to a compilation problem concerning the template < class T >
void apply ( int index, T functor )
{
switch ( index ) {
case 0 : functor ( tuple[0] ); break;
case 1 : functor ( tuple[1] ); break;
case 2 : functor ( tuple[2] ); break;
case 3 : functor ( tuple[3] );
}
}
If the user declares a First ApproachWe could use a structure that encapsulates each Example: case 0 : SilentCase< 0 > ( tuple, functor ); // generates functor( tuple[0] );
case 3 : SilentCase< 3 > ( tuple, functor ); // generates nothing
But there are still two performance problems:
Second ApproachTo overcome the last two problems, we must adapt the number of cases in the Example (pseudo-code): template < int I >
struct apply_select;
template <>
struct apply_select < 2 >
{
static void apply ( int index, T functor )
{
switch ( index ) {
case 0 : functor ( tuple[0] ); break;
case 1 : functor ( tuple[1] ); break;
}
};
template <>
struct apply_select < 3 >
{
static void apply ( int index, T functor )
{
switch ( index ) {
case 0 : functor ( tuple[0] ); break;
case 1 : functor ( tuple[1] ); break;
case 2 : functor ( tuple[2] ); break;
}
};
...
In the apply_select< record_size >::apply( index, functor );
This way, only the Branch EliminationWe saw earlier that a functor is used as an argument to the struct MyFunctor {
template < class T >
void operator()( T& element )
{
process( element.data );
}
};
Generic processes are interesting because they may allow the compiler to do an additional optimization concerning the For example, consider the following code: switch (x) {
case 0 : foo(); break;
case 1 : foo(); break;
case 2 : bar(); break;
case 3 : bar(); break;
case 4 : bar(); break;
}
We saw that such code will lead the compiler to generate a Pseudo generated code: jump JUMP_TABLE[x]
label_case_0 : call foo ...
label_case_1 : call foo ...
label_case_2 : call bar ...
label_case_3 : call bar ...
label_case_4 : call bar ...
A compiler applying branch elimination will generate the following instead: Pseudo generated code: jump JUMP_TABLE[x]
label_case_foo : call foo ... // 2 cases merged
label_case_bar : call bar ... // 3 cases merged
This has numerous advantages because:
The template To understand the issue, let's look at how the template < class T, class U >
apply_internal( T& record_data, int index, U& functor)
{
switch( index ) {
...
}
}
template < class T, class U >
apply_internal( T& record_data, int index, U& functor)
{
switch( index ) {
case 0 : &record_data // generated address of the 1st element
case 1 : &record_data + 12 // ... 2nd element
case 2 : &record_data + 26 // ... 3rd element
...
}
}
But if we pass directly the address of the concerned element instead of the whole tuple, the compiler will have to handle the same reference for each case: template < class T, class U >
apply_internal( T& element, int index, U& functor)
{
switch( index ) {
case 0 : &element
case 1 : &element
case 2 : &element
}
}
To know the address of the element pointed by the index before going to the template < class U >
void apply (int index, U& functor )
{
apply_internal< record_type >( addrTable[index], x, functor )
}
with template < class record_type, class U >
void apply_internal( void* element, int index, U& functor)
{
switch( index ) {
case 0 : functor ( * (typename get_type< record_type, 0 >::Type *>) // C-style cast
element );
break;
case 1 : functor ( * (typename get_type< record_type, 1 >::Type *>)
element );
break;
}
}
However, this method has several drawbacks:
That's why this optimization is disabled by default. It can be enabled by defining the UsageThis section lists the #include "tek/record.h"
#include <iostream>
using namespace std;
// we define a functor whose the job is to increment an element.
struct Inc {
template < class T >
void operator()( T& element ) const
{
element++;
}
};
int main()
{
tek::record< int, char, double > myRecord( 2, 'r', 5.25 );
cout << myRecord; // prints: (2 r 5.25)
cout << myRecord.get<1>(); // prints: r
myRecord.get<0>() = 10; // 2 is replaced by 10
myRecord.for_each( Inc() ); // Increment each element
cout << "Enter the index of the element you want to increment :"
<< endl;
int index ;
cin >> index;
if ( myRecord.is_valid_index( index ) )
myRecord.apply( Inc(), index );
else
cout << "Invalid index entered";
return 1;
}
Here is the complete list of operations supported by Constructors
Member Functions
I/O OperationsThose operations are very basic and don't use manipulators. They are provided mainly to test the other records functionalities. I/O operations might be fully supported in the future.
Type Information
Configuration of tek::recordThe
Non-member EquivalentsIf the Example: template < class T>
void f( T& myRecord )
{
// myRecord.get<1>(); // error on some compilers
myRecord.template get<1>(); // ok
}
Only compilers that implement the two-phase name look-up, such as GCC, will issue an error if the That's why each of the member functions and structures has an equivalent in a non-member way so that it is easier to write portable code. For consistency, all member-functions have a non-member counterpart, they have the same name and the same parameter list with the addition of a myRecord.get< 0 >();
tek::get< 0 >( myRecord );
myRecord.for_each( myFunctor() );
tek::for_each( myRecord, myFunctor() );
myRecord.apply( myFunctor(), index );
tek::apply( myRecord, myFunctor(), index );
Functor DefinitionMost of the time, the functor used as an argument for the tek::apply( myRecord, myFunctor() );
If so, the struct MyFunctor {
template < class T >
void operator()( T& element ) const
{
....
}
}
To simulate argument passing, you can use the constructor of the functor: struct MyFunctor {
MyFunctor(int arg) : arg_(arg) {}
template < class T >
void operator()( T& element ) const
{
// some computings with arg_
}
int arg_;
}
...
tek::apply( myRecord, myFunctor( 2 ) );
To return a value from the struct Inc : public tek:return_type< int > {
template < class T >
int operator()( T& element ) const
{
....
}
}
BenchmarkTested SolutionsI measured the dynamic dispatching performance of three kinds of heterogeneous containers:
Performance Measurement MethodologyEvery benchmark is deceptive in some way. In this section, I try to limit this aspect by providing details about how I proceeded. First of all, since the performance may vary with the number of elements handled by the container, I benchmarked each container with 2 to 20 elements. Each element type is a different class inheriting from a common
For the template
Note: Because, in our case, passing by reference is faster than passing by value, the parameter is Class DefinitionFor each class, the I tested two configurations:
Index PropertyThe index is determined at runtime. Since the call to
We don't only use a random index because the performance of using such an index is greatly dependent on the CPU and its branch prediction unit, and important variations can occur. For example, my machine is a Pentium 4 Prescott that has a 31 stages pipeline. A mistaken branch prediction results in a flush of this pipeline, which considerably degrades performance. A stable index eliminates those variations since the taken branch is always the same. Furthermore a stable index doesn't really favor one solution over another since all three solutions use an unconditional jump with a non-immediate branch target, to implement their dispatch mechanism. ConfigurationsFinally, in order to reduce the number of graphs, I used only two configurations for measuring performance:
System SpecificationThe benchmarks were performed on a Intel Pentium 4 3Ghz HT, 1 Gb RAM. 2 "platforms" were tested:
BenchmarksStable ConfigurationPlatform Visual C++
The The Platform GCC
As a general tendency, GCC inlines more aggressively than Visual C++ with the given optimization flags. The for ( ... )
{
if (stable_index == 0)
apply_to_the_first_element
else
apply_to_the_second_element
...
}
However the if (stable_index == 0)
for ( ... ) {
apply_to_the_first_element
}
else
for ( ... ) {
apply_to_the_second_element
}
...
Even without this optimization that further increases the performance until four types are handled, Unstable ConfigurationPlatform Visual C++
As expected, the performances of each solution are unstable with a random index. Because the Platform GCC
The code generated by GCC for the ConclusionConcerning dynamic dispatch performances, General Conclusion
References[Alexandrescu]: "Modern C++ Design: Generic Programming and Design Patterns Applied", Andrei Alexandrescu - Addison Wesley, 2001 Notes[1] ^ Without considering data structure padding [2] ^ The visitor pattern [3] ^ If there are very few cases in the [4] ^ Some compilers are more or less tolerant towards how contiguous the values must be. [5] ^ For instance, if the index, used to access an element in some array, is computed earlier in such a way that it will be mathematically a valid index under any circumstance, the compiler can then apply bounds checking elimination and use the index directly: it doesn't affect the program's correctness. [6] ^ Megamorphism = polymorphism with a high number of types. [7] ^ Virtual call speculation is one of the profile guided optimizations proposed by Visual C++ 8. History
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||