Click here to Skip to main content
Licence CPOL
First Posted 24 Feb 2010
Views 17,909
Downloads 38
Bookmarked 17 times

Duck Typing in C++ Using Boost

By | 27 Feb 2010 | Article
Uses Boost preprocessor and templates to achieve "duck typing", as provided by other languages such as JavaScript

Introduction

This article grew out of the need to make conversions between similar-looking C++ struct types. For example, consider the following declarations:

typedef struct S1_version1
{
 int scalar1; 
 int scalar2;
};

typedef struct S1_version2
{
 int scalar1; 
 int scalar2;
 int scalar3;
};

This article deals with the need to convert, for example, an instance of the first struct into an instance of the second struct. This is done in the obvious way: by copying data into the members with identical names.

The structs involved are independent of each other in the C++ typing system; they share, for example, no common base type. From a practical standpoint, though, they seem to be closely related, since the second type contains all of the members from the first type. Often, one of the types will be a later version of the other type, and some sort of backwards compatibility requirement will necessitate conversion between the types.

Conversion can, of course, be done by writing one assignment statement for each member in S1_version1. The techniques described here automate this process, so that only a single operation (assignment or method call) is necessary for conversion.

Background

In many languages, conversion of an instance of S1_version1 to an instance of S1_version2 would be available automatically. Consider the following snippet, which relies on the declarations just presented:

S1_version1 first;
first.scalar1=1;
first.scalar2=2;
S1_version2 second=first; //Does not compile

In C++, of course, this code will not compile - at least not as presented thus far. The goal of this article is to make such code compile, as transparently as possible. But for now, attempting to compile such code using, for example, the MinGW compiler results in the following error:

> structdemo.cpp:60: error: conversion from 'S1_version1' 
       to non-scalar type ‘S1_version2' requested

In other languages (specifically, in languages with so-called “duck” or “duct” typing), such automatic conversions are syntactically correct. Members get mapped based on the name as a part of the conversion.

The code presented in the rest of this article achieves something similar in C++. In particular, it offers a system of struct declaration macros. The structs declared using the macros can be converted effortlessly into instances of any other struct with the same members.

These macros use the “Boost” preprocessor. Although knowledge of Boost is not at all necessary to use the system presented here, the requisite header file relies heavily on it.

Boost is one of many “Open Source” libraries built around the C++ language, but enjoys several distinctions relative to other such libraries. First, Boost is “peer-reviewed”, i.e. multiple academics in the field of Computer Science have signed off on its contents. Second, Boost is actually intimately associated with the C++ standards committee, and has a sort of quasi-official status. Parts of Boost are slated for inclusion in forthcoming C++ standards, for example, and several leaders of the Boost working group are also on the C++ standards committee. Another point in favor of Boost is that it is completely written in C++. It does not alter the language. Rather, it augments it with a collection of standardized, well-considered code.

Finally, although it is “Open Source”, Boost is licensed in a fashion that allows for its use in for-profit work. This is a critical distinction for the professional developer, who must respect not just copyright law, but also so-called “copyleft” restrictions, which forbid using certain other Open Source libraries for profit.

Using the Code

Returning to the C++ example presented earlier, consider the following program:

#include "StructDuckTyping.h"   //Part of this article’s code
#include <cstdio>

//Declare a type named S1_version1, with two “int” members
DECLARE_DUCKTYPED_STRUCT( S1_version1,   
  ((int)(scalar1))
  ((int)(scalar2))
)

typedef struct S1_version2
{
 int scalar1; 
 int scalar2;
 int scalar3;
};

int main()
{

 S1_version1 first;
 first.scalar1=10;
 first.scalar2=20;
 S1_version2 second=first;

 printf("%d\n%d",second.scalar1, second.scalar2);

 return 0;
}

The operation of this code is similar to the JavaScript example. The conversion from S1_version1 to S1_version2 takes place automatically, and the output of the program is thus “10” followed by “20”.

The “StructDuckTyping.h” header file includes all of the code necessary to make this work. In particular, this header file contains the definition of the DECLARE_DUCKTYPED_STRUCT macro. The format used inside each invocation of DECLARE_DUCKTYPED_STRUCT is a bit unfamiliar; it looks more like LISP than C++. In summary, each member declaration is wrapped in a pair of parentheses. Within these, the type of the member is similarly wrapped, as is the member name. This member list is preceded by the name of the struct being declared, and separated from it by a comma. As suggested by the single comma, DECLARE_DUCKTYPED_STRUCT, technically speaking, takes only two parameters, although the second of these is a list. None of this has proven difficult to deal with in practice, in the author’s experience.

Clearly, DECLARE_DUCKTYPED_STRUCT is a powerful macro. However, it is worth noting that the file in which it is defined is not so large (less than 150 lines of code, even with some fairly extensive comments). Moreover, the way the macro expands during preprocessing is quite mechanical. The macro expansion can be obtained by using the –E option of MinGW, which runs the preprocessor without compiling. The expansion of the macro as used in the example above is shown below:

typedef struct 
{ 
 int scalar1 ; 
 int scalar2 ; 
 
 template <class T> operator T() 
 {
  T out; 
  out.scalar1 =scalar1 ; 
  out.scalar2 =scalar2 ; 
  return out; 
 }

 template <class T> void GetMembers(T& out) 
 {
  out.scalar1 =scalar1 ; 
  out.scalar2 =scalar2 ; 
 }

} S1_version1;

This looks essentially like the declaration used for the struct in the first example, except that a type converter method has been added, along with a similar-looking GetMembers() method. In summary, for any type T, the type converter method will instantiate an instance of T and then attempt to copy all struct members into this new instance on a name-by-name basis. Then, the new instance is returned by value to the caller.

Note that there are situations in C++ where member-wise copy of the sort shown above is done automatically, without any need for special macros. In particular, classes and structs without an explicit copy constructor (i.e., a constructor taking another instance of the same class) will have an implicit copy constructor that uses member-wise copy. However, nothing in the C++ language allows for automatic member-wise copy between unrelated data types.

Also, some considerations related to performance and storage utilization ought to be mentioned. In particular, the creation of an instance of type T on the stack, followed by its return to the caller on a value basis, is somewhat inefficient. It would be more efficient for the caller to create the instance a single time, and then for the macro logic to simply fill the members of that instance. Note that the user of this library can accomplish this using the GetMembers() method. This method accepts an existing instance of type T and populates the members of this instance.

It must be mentioned, too, that DECLARE_DUCKTYPED_STRUCT does not support conversions where type T lacks one or more members present in the source type. Fortunately, in cases where a data model fails to respect this constraint, a compile-time error (as opposed to a run-time failure) is obtained. In MinGW at least, the error message is remarkably intuitive:

C:\beau>c:\mingw\bin\c++ dt_basic.cpp

 > dt_basic.cpp:31: error: 'struct S1_version2' has no member named 'scalar2'
 > dt_basic.cpp: In member function `S1_version1::operator T() [with T = S1_version2]':
 > dt_basic.cpp:29: instantiated from here
 > dt_basic.cpp:4: error: 'struct S1_version2' has no member named 'scalar2'

When faced with such an issue, one option is to run the preprocessor without compiling (e.g., to use “MinGW –E”) to obtain the macro expansion. Then, assignments into the “missing” member can be commented out, and the resulting code placed into the original .CPP file in lieu of the macro invocation.

One final limitation of DECLARE_DUCKTYPED_STRUCT is that methods (other than the conversion methods generated by the macro) are not supported. Although it is somewhat atypical for structs (as opposed to classes) to contain methods, it is by no means forbidden by the C++ standard. However, the syntax of DECLARE_DUCKTYPED_STRUCT does not contain any support for methods, nor does it support the use of access modifiers (e.g., public and private). Again, the use of these modifiers is more typical of class declarations than of struct declarations, and the use of DECLARE_DUCKTYPED_STRUCT completely rules out their use.

At this point, the entire DECLARE_DUCKTYPED_STRUCT macro has been presented and explained. However, some nuances of its operation with more complex types still need to be explored. First, it bears mentioning that the macro, as already presented, supports the nesting of duck-typed struct instances within other duck-typed structs. The type converter method both uses and overrides the assignment operator. So, conversions involving nested structs naturally result in a call from one struct’s type converter into the other struct’s type converter, down to any level of nesting. Even if the inner struct(s) actual types have changed, proper function will be provided, so long as members with the correct names are present. Consider the following example code:

#include "StructDuckTyping.h"
#include <cstdio>

DECLARE_DUCKTYPED_STRUCT(Inner_version1,
 ((int)(inner_scalar1))
)

DECLARE_DUCKTYPED_STRUCT(Inner_version2,
 ((int)(inner_scalar1))
 ((int)(inner_scalar2))
)

DECLARE_DUCKTYPED_STRUCT(Outer_version1,
  ((int)(n))
  ((Inner_version1)(inside))
) 

DECLARE_DUCKTYPED_STRUCT(Outer_version2,
  ((int)(n))
  ((Inner_version2)(inside))
) 

int main()
{
 Outer_version1 first;
 first.inside.inner_scalar1=111;

 Outer_version2 second=first;

 //Prints "111"
 printf("\n%d", second.inside.inner_scalar1);

 return 0;

}

This example really points out the power of the macro system used here. Four structured types are in play, none of which have any real relation to each other in the basic C++ typing system. The use of DECLARE_DUCKTYPED_STRUCT allows for effortless conversion among these four types, to the extent that member names match up correctly.

Another aspect of DECLARE_DUCKTYPED_STRUCT that remains unmentioned is its support for array members. This support is complete, and transparent. Consider the following declaration:

DECLARE_DUCKTYPED_STRUCT(S1_version1,
  ((int)(ara)(10)(20)(30)(40))
)

This expands as follows:

typedef struct 
{ 
 int ara [10] [20] [30] [40] ; 
 template <class T> operator T() 
 {
  T out; 
  for(int counter0 =0; counter0 <10; ++counter0 ) 
   for(int counter1 =0; counter1 <20; ++counter1 ) 
    for(int counter2 =0; counter2 <30; ++counter2 ) 
     for(int counter3 =0; counter3 <40; ++counter3 ) 
      out.ara [counter0] [counter1] [counter2][counter3] =
       ara [counter0] [counter1] [counter2] [counter3] ; 

  return out; 
 }  
} S1_version1;

(GetMembers() is omitted for brevity, but is again essentially identical to the type converter, minus the instantiation of out.)

This expansion will work in seamless fashion so long as the dimensions of ara only grow over time. Also, ara need not be an array of a simple type. It can be an array of structs, even structs declared using DECLARE_DUCKTYPED_STRUCT. Because of the general way in which this macro overloads assignment, the assignment operation inside the mechanically generated for loops will end up invoking our templated type converter method, and proper name-by-name mapping will occur for each array element.

In using arrays we do, unfortunately, encounter a run-time risk; if the dimensions shrink from one struct to the next, a buffer overrun can occur. However, this is but one instance of a very typical issue presented by the C++ language.

Finally, DECLARE_DUCKTYPED_STRUCT also supports the inheritance of struct members. It is atypical, at least in the experience of the author, for C++ structs to inherit from each other. Nevertheless, DECLARE_DUCKTYPED_STRUCT has been extended to support this feature – and even to support multiple inheritance. This proved to be practical without adding too much to the necessary header file. In particular, the existing GetMembers() method proved helpful in implementing support for inheritance.

Declaration of derived struct types uses a second macro, DECLARE_DUCKTYPED_STRUCT_BASES. This is similar to DECLARE_DUCKTYPED_STRUCT, except that it takes as its final parameter a list of base types. An example declaration is given below:

DECLARE_DUCKTYPED_STRUCT(Base1_version1,
  ((int)(base1_member))
) 

DECLARE_DUCKTYPED_STRUCT(Base2_version1,
  ((double)(base2_member))
) 

DECLARE_DUCKTYPED_STRUCT_BASES(Derived_version1,
  ((int)(derived_member)),
  (Base1_version1)(Base2_version1)
)

The expansion of Derived_version1 is as shown below. Again, GetMembers() is omitted for the sake of brevity, since it follows logically from the type converter:

typedef struct dummybaseversionatorDerived_version1{}; 
typedef struct : dummybaseversionatorDerived_version1, 
                 Base1_version1 ,Base2_version1 
{ 
 int derived_member ; 
 template <class T> operator T() 
 {  
  T out; 
  ((Base1_version1*)this)->GetMembers(out); 
  ((Base2_version1*)this)->GetMembers(out); 
  out.derived_member =derived_member ; 
  return out; 
 } 
} Derived_version1;

The most unfamiliar aspect of this macro expansion is probably the use of the additional type named dummybaseversionatorDerived_version1. The dummy base type ensures that at least two struct types will be present in the listing of base types near the top of the expanded declaration. This allows the logic that does the work of constructing the listing to simply append a comma to this listing before each of the actual base types. This is a somewhat atypical approach. If emitting a similar list (for whatever reason) at run-time, it would be reasonable to treat the first base class as a special case that does not need a comma. However, preprocessor logic is tricky to code and maintain, and the use of the “dummy” base type is a welcome simplification.

As hinted, GetMembers() is used to effect name-by-name assignment of each base type’s members into the variable out. Each base type’s GetMembers() method is invoked upon the derived type instance (i.e., on this), in order to extract the members received from that base type. To allow access to each base type's version of GetMembers(), the converter methods generated by the declaration macros are not marked as virtual. This means that the static (i.e., compile-time) type of the instance used to invoke GetMembers() determines which version of the method gets executed. As a result, typecasting this to a base type pointer allows the base type’s GetMembers() method to execute.

Points of Interest

At this point, some details about the requisite header file (StructDuckTyping.h) are in order. While complete coverage of the Boost knowledge that underlies this header’s construction cannot be provided here, it is not difficult to describe the general workings of the file. In fact, the macro expansion examples already given provide a very complete high-level summary of the contents of StructDuckTyping.h.

The core of the header file is DECLARE_DUCKTYPED_STRUCT, whose definition is as follows:

#define DECLARE_DUCKTYPED_STRUCT(name,seq) \
typedef struct  \
{ \
    \
   /* Declare members */ \
   BOOST_PP_SEQ_FOR_EACH(DEF_MEMBER,~,seq) \
   \
   /* Provide a copy method */ \
   template <class T> operator T()\
   { \
     T out;\
     \
     /* Copy members to new stack struct */ \
     BOOST_PP_SEQ_FOR_EACH(COPY_MEMBER,~,seq) \
     return out;\
   }\
   template <class T> void GetMembers(T& out) \
   { \
     /* Copy members to existing struct */ \
     BOOST_PP_SEQ_FOR_EACH(COPY_MEMBER,~,seq) \
   }\
} name;

Like any C++ parameterized macro, this entire entity represents a blueprint for the text replacement that constitutes macro expansion. As is typical of such macros, the “line continuation” backslash is used to mark out the extent of the macro. The constituent parts of the expansion examples (i.e., the two converter methods, and the members themselves) are clearly visible in the macro definition, albeit in strangely augmented form.

Clearly, BOOST_PP_SEQ_FOR_EACH and the seq macro parameter play a key role here. This seq parameter is a Boost “sequence”. The sequence is a Boost concept, which consists of a parenthesized list of parameters designed to be iterated over – during preprocessing - by BOOST_PP_SEQ_FOR_EACH, and some other similar Boost constructs. A simple sequence might take the form (1)(2)(3)(4).

BOOST_PP_SEQ_FOR_EACH takes three parameters, of which only the first and third are used here. The first parameter is another preprocessor macro into which each sequence element gets passed as a parameter. This macro is expanded at the point in the code where BOOST_PP_SEQ_FOR_EACH appears. The third parameter is the sequence over which iteration will occur.

Sequences can be nested; in fact, this is critical to the library presented here. Consider, for instance, the following member list taken from a previous example:

((int)(inner_scalar1))
((int)(inner_scalar2))

This fragment is itself a sequence, i.e., it is a list of parenthesized elements. Each of these elements is also a sequence. The outer sequence gets passed to DECLARE_DUCKTYPED_STUCT as a member list. Then, each element of the outer sequence is iterated over by the three invocations of BOOST_PP_SEQ_FOR_EACH visible in the macro definition. Respectively, these will perform the mechanical generation of the member list, of the assignments in the type converter, and of the assignments in GetMembers().

The first invocation of DECLARE_DUCKTYPED_STRUCT builds the member list. This is shown below:

/* Declare members */ \
BOOST_PP_SEQ_FOR_EACH(DEF_MEMBER,~,seq) \

This means that the DEF_MEMBER macro will get expanded into the preprocessed source for each element of seq. For example, in its first invocation, DEF_MEMBER will receive ((int)(inner_scalar1)) as its parameter, and will use Boost for the sequence-processing facilities to emit a syntactically correct declaration (int inner_scalar1;) into the preprocessed code. Similarly, COPY_MEMBER uses BOOST_PP_SEQ_FOR_EACH to generate the for loops necessary to copy array members.

In closing, I want to contrast the techniques described here with a potential low-level approach I considered. Consider, for example, the program shown below:

#include <cstdio>
#include <memory.h>

typedef struct S1_version1
{
 int scalar1; 
 int scalar2;
};

typedef struct S1_version2
{
 int scalar1; 
 int scalar2;
 int scalar3;
};

int main()
{

 S1_version1 first;
 first.scalar1=1;
 first.scalar2=2;

 S1_version2 second;

 //Just copy the bytes and hope for the best 
 memcpy((void*)&second, (void*)&first, sizeof(S1_version1));

 //Prints “1” then “2”… seems to work well
 printf("\n%d\n%d",second.scalar1, second.scalar2);

 return 0;
}

The call to memcpy() does the job, in this case, of moving data from one struct to the other. Superficially, one might wonder if this low-level approach might be extended to deal with all the other scenarios already dealt with in the discussion above (arrays, nested structs, and so on). It seems surprisingly easy to address 90% or so of the real-world cases addressed by StructDuckTyping.h by using memcpy() or something like it; the logical question is thus whether it is possible to reach 100% using such techniques.

To do this would be extremely difficult, or at least significantly more difficult than the approach outlined in this article. Fundamentally, such an effort would have to rely on logic to determine numerical offsets for each struct member in memory. This is no problem in the example given above, since the second struct is identical to the first, except for the new member at its end. More substantial differences in struct declarations – such as changes in the dimension of array members, or changes in the declarations of base types or member struct types – would make the task of determining member offset quite difficult. More subtle issues impinge on such an effort as well, such as padding and alignment rules. These rules are compiler-specific and even project-specific.

The power of name-based assignment abstracts over a whole range of such issues, and hints at what I consider to be an oft-overlooked strength of C++: its rich set of compile-time facilities, such as the preprocessor and templates. Very often, C++ programmers are instructed to minimize the use of the preprocessor. I sympathize with this desire. As this article hints, though, the preprocessor is quite powerful in the right hands. In theoretical terms, macro expansion offers the full power of Lambda Calculus’ “normal-order reduction”, which is the most powerful method of function invocation. The practical value of the preprocessor is, I hope, hinted at by my library. The preprocessor can be difficult to deal with, but it is, at times, a very necessary evil.

One big advantage of using the preprocessor in StructDuckTyping.h is that this aspect of the design reduces the run-time burden associated with duck typing. True “dynamic typing” – of which most duck typing is an example - occurs at run-time. In JavaScript, for example, types are checked for member-level compatibility as necessary during execution. This results in a performance penalty, and also defers potential errors from compile-time to run-time, at which point the consequences of an error are often much more dire.

To summarize, I think the approach described here is a powerful, optimal, and general one. These aspects of my design serve to justify the (somewhat tricky) preprocessor macro development that was required.

History

This is the second major version of this article. Some minor changes to terminology were made.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

_beauw_



United States United States

Member

I was educated at Southern Miss.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralBoost License PinmemberGreg Prosch11:22 2 Mar '10  
GeneralRe: Boost License Pinmember_beauw_12:54 2 Mar '10  
GeneralMy vote of 2 Pinmembermarl22:01 1 Mar '10  
GeneralRe: My vote of 2 PinmemberS.H.Bouwhuis4:23 2 Mar '10  
GeneralMy vote of 5 PinmvpCPallini22:21 8 Mar '10  
GeneralSome context for the critics of this article [modified] [modified] Pinmember_beauw_14:22 1 Mar '10  
GeneralRe: Some context for the critics of this article PinmemberS.H.Bouwhuis4:27 2 Mar '10  
GeneralRe: Some context for the critics of this article Pinmember_beauw_12:46 2 Mar '10  
QuestionBack to the Future? PinmemberDavid Scambler12:42 1 Mar '10  
GeneralMy vote of 1 PinmemberCrawfis10:08 1 Mar '10  
GeneralMy vote of 2 PinmemberAlain Rist5:32 25 Feb '10  
GeneralMy vote of 2 explained PinmemberAlain Rist5:31 25 Feb '10  
GeneralRe: My vote of 2 explained [modified] Pinmember_beauw_6:12 25 Feb '10  
GeneralRe: My vote of 2 explained PinmemberAlain Rist7:24 25 Feb '10  
GeneralRe: My vote of 2 explained PinPopularmember_beauw_11:15 25 Feb '10  
GeneralRe: My vote of 2 explained PinmemberLeslie Sanford7:32 28 Feb '10  
Generaloops PinmemberStephen Swensen15:06 24 Feb '10  
GeneralRe: oops Pinmember_beauw_15:11 24 Feb '10  
GeneralRe: oops PinmemberStephen Swensen15:16 24 Feb '10  
GeneralRe: oops Pinmember_beauw_15:20 24 Feb '10  
GeneralRe: oops PinmemberStephen Swensen15:23 24 Feb '10  
GeneralRe: oops Pinmember_beauw_5:35 25 Feb '10  
GeneralAnother way to do it PinmemberGordon Brandly9:58 24 Feb '10  
GeneralIt's Duck Typing PinmemberWilliam E. Kempf6:28 24 Feb '10  
GeneralRe: It's Duck Typing Pinmember_beauw_7:36 24 Feb '10  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120517.1 | Last Updated 27 Feb 2010
Article Copyright 2010 by _beauw_
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid