Introduction
In this article, I will describe the implementation of a light-weight mechanism for persisting C++ objects to XML or binary formats. Articles of this nature do not naturally generate a great deal of visual content, hence the many interspersed code fragments. I hope you find it interesting.
The sample code contains VS2003 and VS2008 projects which build a console mode Unit Test application. /W4 is used throughout.
In addition, there is a ultra-simple Contacts MFC app which displays content in a hybrid grid-tree control. The purpose of the GUI app is to show how:
- To extend ESS marshalling for your own classes
- Use simple versioning
- How ESS copes with real-life code - protected constructors, virtual void functions, etc.
- Dynamic object creation - and how ESS handles errors

When to use
I've found a multitude of applications for this technique - including persisting of program options, saving state for undo/redo operations, automatically enabling XML file formats for application data, client-server communications (i.e., packets on the wire), and storage of non-relational data as XML in SQL databases.
Key features
- All ISO C++ compliant, portable, code
- Does not require persistent classes to share a common base class
- Does not require RTTI-enabled compilation
- Respects existing access control for constructors and destructors
- Will serialize pointers to classes/structs that can be serialized
- Correctly restores contents of containers of pointers to polymorphic objects
- Emphasizes compile-time checking to minimize runtime errors
- Macros used only for brevity and forwards to debuggable code
- Implementation is completely in-line - only need to
#include
ESS header files - Very simple to add new storage formats - JSON, for example
Constraints
- Requires serializable classes to have a void constructor
- The current implementation assumes serialization happens in one thread - little is required to add thread safety
- Explicitly does not support serialization of 'C' pointer types, especially
void*
and friend
s - UNICODE string storage not yet implemented for XML
- No theoretical impediments to using ESS with multiple inheritance, but completely untested
Conventions
In order to prevent endless repetition, let us assume any class C0 is a base class in an arbitrary hierarchy, wherein C1 is derived from C0, and C2 is, in turn, derived from C1. A root class describes one which is the 'least-derived'. RTTI is run-time type information, and is the abbreviation I'll apply when discussing how to keep a record of class names and derivation information at runtime. Thus, we have:

A note on macros and templates
In a nutshell, generally very useful, if applied with taste and discretion. I greatly prefer code that can be correctly debugged, which sets an easy test all macros must pass: if you step into a macro in a debugger, do you see code or text?
The ESS_REGISTER
, ESS_RTTI
, and ESS_STREAM
macros all use the string'ize operator (#
) to generate strings from class and instance names. This is, I believe, a good thing as it reduces scope for error. ESS_RTTI
also declares the templated factory class responsible for creating new instances on the heap as a friend so it has access to protected/private constructors and destructors. This makes it much simpler to apply ESS to existing code. I would count these as definite benefits.
Where executable code is contained in a macro, it is always forwarded to a templated inline function so you can debug things properly. For example:
#define ESS_ROOT(rootname) typedef ess::root<rootname> ess_root;
#define ESS_STREAM(stream_adapter,class_member) \
ess::stream(stream_adapter,class_member,#class_member)
#define ESS_RTTI(classname,rootname)\
friend ess::CFactory<classname,rootname>; \
virtual const char* get_name()\
{ return ess::get_name_impl<classname>(#classname); }\
static ess::class_registry<classname>* get_registry()\
{ return ess::get_registry_root<classname,rootname>(#rootname); }
ESS_RTTI
is the most complex as the ESS macros get.
One more philosophical declaration: templates are wonderful, but template meta-programming is not. Why? Template meta-programming fails the debugger test.
A minimal ESS example
Let us start with a simple example. C0
is the root class we wish to serialize, what follows is an ultra simple inline implementation.
#include "ess_stream.h"
#include "ess_xml.h"
class C0
{
short m_id;
std::vector<C0*> m_children;
virtual void serialize(ess::archive_adapter& adapter)
{
ESS_STREAM(adapter,m_id);
ESS_STREAM(adapter,m_children);
}
public:
ESS_ROOT(C0)
ESS_RTTI(C0,C0)
};
class C1: public C0
{
ESS_RTTI(C1,C0)
};
and here is the code to perform serialization in both directions:
int version = 1;
std::string xml_root = "root";
std::string instance_name = "x";
try
{
ess::Registry registry;
registry << ESS_REGISTER(C0,C0);
ess::xml_medium storage;
{
C0 c0;
C1 c1;
ess::xml_storing_adapter adapter(the_storage,xml_root,version);
ess::stream(adapter,c0,"c0");
ess::stream(adapter,c1,"c1");
}
{
C0* p0 = 0;
Chordia::xml_source xmls(storage.c_str(),storage.size());
ess::xml_loading_adapter adapter(xmls,xml_root,version);
ess::stream(adapter,p0,instance_name);
delete p0;
}
}
catch(...)
{
}
The XML generated in example()
is this:
="1.0"="UTF-8"="yes"
<root version="1"/>
<class derived_type="C0" name="c0">
<signed_short name="m_id" value="1"/>
<vector name="m_children" count="0">
</vector>
</class>
<class derived_type="C1" name="c1">
<signed_short name="m_id" value="1"/>
<vector name="m_children" count="0">
</vector>
</class>
</root>
In detail
- RTTI
- Registration
- Adaptors
- Error handling
- Unit Tests
Let us get down to the details. Persisting basic C++ classes is not too difficult - MFC has had a mechanism to do just this since the beginning of time. We start out in a similar way; the basis of the system is serialization by decomposition - classes are reduced to atomic elements, which then are written and read at runtime. Reading and writing uses a symmetric serialize()
function which reduces programming requirements and possible errors. The really tricky bits come from the following:
- No common base class
- Correctly reconstructing pointers to polymorphic class instances
- Keeping it compiler friendly
- Keeping it programmer friendly
RTTI and friends
Hypothesis 1: If we want to restore polymorphic types correctly, then:
- we have to be able to distinguish between derived types in some way, and
- we have to be able to do this at runtime.
Cutting to the chase, the simplest way to do this is to outfit each ESS compliant class with a virtual get_name()
function. We can then do this:
std::vector<C0*> vec;
vec.push_back(new C0); vec.push_back(new C1); std::string n0 = vec[0]->get_name(); std::string n1 = vec[1]->get_name();
Hypothesis 1 implies that we need to be able to create (in a type-safe way) arbitrary instances of different types given only a string. We also have an added wrinkle which we show here by introducing an unrelated hierarchy prefixed with 'D':
C0* pc0 = hey_presto("C0");
C1* pc1 = hey_presto("C1");
D0* pd0 = hey_presto("D0");
D1* pd1 = hey_presto("D1");
Well, we can achieve something close to example 1.1 by having a static function in each base class C0
and D0
, such that:
C0* pc0 = C0::hey_presto("C0");
D0* pd0 = D0::hey_presto("D0");
Enter templates. We can escape the burden of having to qualify the type name by having a templated version of hey_presto()
:
template <typename T>
inline
T*
hey_presto(const std::string& classname)
{
return new T_Or_Derivative_Of_T;
}
C0 pc0 = hey_presto<C0>>("C0");
D0 pd0 = hey_presto<D0>("C0");
In reality, the solution is somewhat more complicated. In order to be type safe, flexible, and efficient, we apply the generic solution of indirection. We'll submit to the temptation to add one more macro - it has been unveiled already, but let's take a closer look:
static class_registry<classname>* get_registry()
{
return get_registry_root<classname,rootname>(#rootname);
}
class C0
{
static class_registry<C0>* get_registry()
{
return get_registry_root<C0,C0>("C0");
}
};
If we continue to follow the calling trail, we get the following in quick succession:
template <typename Derived,typename Root>
inline
class_registry<Derived>* get_registry_root(const char* rootname)
{
return
reinterpret_cast<class_registry<Derived>*>
(get_registry_impl<Root>(rootname));
}
template <typename Root>
inline
class_registry<Root>* get_registry_impl(const char* rootname)
{
static ess::class_registry<Root> s_registry(rootname);
return &s_registry;
}
template <typename Root>
class class_registry
{
public:
bool Register(const char* classname,IFactory<Root>* pFactory) {}
Root* Create(const std::string& classname) {}
};
In other words, the code in the snippets above equips each root class with a static, templated, class_registry
instance. As you can probably guess from the member function names, class_registry<C0>->Create("C0")
will indeed return us a new C0
instance. We will deal with the Register()
member function in more detail in the next section - suffice it to say that we are getting very close to the hey_presto()
function we wanted before.
As ever with C++, the devil is in the detail. get_registry_impl()
in snippet 2 above returns a pointer to a static class instance. That means:
- there is only ever going to be one instance of
class_registry
class_registry
will only be created when get_registry_impl()
is called class_registry
is accessible to all classes derived from Root
This, in turn, implies that the following is possible:
C0* p0 = C0::get_registry()->Create("C0");
C0* p0 = C0::get_registry()->Create("C1");
Although this is pretty much sufficient for the task at hand, it does make (eventually) for awkward code. To really polish things off nicely, we want to be able to do this:
C0* p0 = C0::get_registry()->Create("C0");
C1* p1 = C1::get_registry()->Create("C1");
Although this looks simple enough, recall that C++ static functions are not virtual. In fact, you cannot have static functions with the same name in two different though related classes. Or, can you? Let's look again:
class_registry<C0>* rc0 = C0::get_registry();
C0* p0 = rc0->Create("C0");
class_registry<C1>* rc1 = C1::get_registry();
C1* p1 = rc1->Create("C1");
The code here obscures the fact that although the static functions have the same name, they actually have a different signature as they return different, but related types. This actually is a bit of a hey-presto moment because we can now write a single templated inline function which creates any arbitrary instance from a string:
template<typename Type>
inline
Type*
instance_from_name(const std::string& classname)
{
ess::class_registry<Type>* p = Type::get_registry();
return p->Create(classname);
}
Now, we have a single function which works in both cases - note that the template argument type is different in (3).
C0* p0 = instance_from_name<C0>("C0");
C1* p1a = instance_from_name<C0>("C1");
C1* p1b = instance_from_name<C1>("C1");
Now, to try and wrap up this somewhat involved section, we will follow the compiler when we actually stream stuff back from storage. Here is the relevant inline function in ess_stream.h; it is a templated function with a signature that matches pointers to types:
template<typename Type>
inline
void
stream(stream_adapter& adapter,Type*& pointer,const std::string& name)
{
std::string derived_type = get_class_name(adapter);
pointer = instance_from_name<Type>(derived_type);
pointer->serialize(adapter);
}
C0* p0 = 0;
ess_stream(...,p0,...);
C1* p1 = 0;
ess_stream(...,p1,...);
Now, the remarkable thing about this code it that it all returns the same thing, the static registry class that was declared way back in the root of C0
. The templating means that there are multiple ways in which the compiler can establish a type-safe way in which to access the registry, in turn enabling the creation of arbitrary type instances. Note, however, that this convenience comes at a cost. It is now theoretically possible to instantiate partially finished classes! Without using compiler generated RTTI, it is impossible (I believe) to guard against this error at compile time. Indeed, it is very hard to guard against it at run time. Any thoughts on this would be welcomed.
C1* p1 = instance_from_name<C1>("C0");
Registration redux
The purpose of registration is to ensure that each class registry is called into existence before any construction is attempted. A desirable side effect of the implementation is that it is quite hard to do this - after all, any code that serializes a class will end up accessing the registry. However, imagine you have opened your brand new XML enabled persistent object application and selected File >>: Open - the runtime will start to throw
as it attempts to instantiate classes that have not yet been mapped into the system. I also believe that explicit registration is a good idea as it makes it easy to discover where the persistent process starts and thus simplifies debugging or problem diagnosis. Registration itself is trivial, and need only be done once.
ess::registry_manager registry;
registry
<< ess::class_registrar<C0,C0>("C0")
<< ess::class_registrar<C1,C0>("C1")
<< ess::class_registrar<C2,C0>("C2");
ess::registry_manager registry;
registry
<< ESS_REGISTER(C0,C0)
<< ESS_REGISTER(C1,C0)
<< ESS_REGISTER(C2,C0);
Note too that the registry object does not have to be kept around. Registration actually does three things -
- forces the static
class_registry
instance into existence, - creates a templated factory class to create the type in question,
- inserts the factory class instance into the registry under the classname key,
Multiple registration is not an error, unless you try and register a class name with a different factory - this, to me at least, implies some possible programming error. The system will throw
- which is another reason to keep registration in one place in the code. Although I have not specifically tried it (my coding convictions are set against it), this system should work for classes that are exported from dynamic link libraries.
There is another variant of registry_manager
which I have found useful when working with the Diagram Editor posted on CodeProject some years back. I have got a heavily modified version that uses ESS for undo/redo and saving as XML.
ess::typed_registry_manager<CDiagramEntity> registry;
registry
<< ESS_REGISTER(CEditor,CDiagramEntity)
<< ESS_REGISTER(CListBox,CDiagramEntity)
<< ESS_REGISTER(CStatic,CDiagramEntity);
Adaptors
The purpose of the archive_adapter
is to make it easy to store in new formats. An adapter class that loads from (say) JSON or some proprietary binary format only needs to implement the overloaded read()
functions in the archive_adapter
class. The same is true for an adaptor that writes. The source code shows a completely different take on adaptors - have a look at the binary_debug_adapter
class in ess_binary.h. What this does is to dump a binary archive to a text file as it is streamed, in real time; useful if you want to understand binary storage in more detail.
XML is the favoured storage format - I felt it desirable that the XML generated contained enough information to allow desk checking and debugging, and the ubiquity of the format means exchange and interoperability is simple. As well as storing the contents of class members, we want to store their names. For each of the intrinsic types, along with the supported container types, we have a number of inline free functions called stream
which have a specific type signature, which all do the same thing - take the arg
and name
parameters and then:
- If the
archive_adapter
is storing, write the argument data and its name to the underlying storage - If the
archive_adapter
is loading, then read the value of the named item back into the arg
parameter
namespace ess
{
inline void
stream(archive_adapter& adapter,bool& arg,const std::string& name) {...}
inline void
stream(archive_adapter& adapter,GUID& arg,const std::string& name) {...}
template<typename Type> inline void
stream(archive_adapter& adapter,Type& arg,const std::string& name) {...}
template<typename Type> inline void
stream(archive_adapter& adapter,Type*& arg,const std::string& name) {...}
template<class Type> inline void
stream(archive_adapter& adapter,std::vector<Type>& arg,const std::string& name) {...}
template<typename Key,typename Value> inline void
stream(archive_adapter& adapter,std::map<Key,Value>& arg,const std::string& name) {...}
}

Error detection
Although ESS makes it easy to add runtime persistence to C++ classes, we do not want to sacrifice any of the compile time checking the language affords. Indeed, wherever possible, we want to warn the programmer if the shotgun pointed at the foot is about to fire. Consider the following:
ESS_ROOT(C0)
ESS_RTTI(C0,C0)
ESS_RTTI(C1,C0)
ESS_RTTI(C2,C1) <- wrong...
though easily done. C1
is not the root class. How can we detect this at compile time? With some difficulty! There is something called a compile_time_checker
in the ess_rtti
header file. It exists solely to ensure that a class that has been declared as an ESS_ROOT
is always used as the root in the ESS_RTTI
macros. In other words, it will fail to compile if the sort of error shown above is made.
template <typename Derived,typename Root>
struct compile_time_checker
The following categories of error do not require support code - as they end up being syntax errors or (much more likely) irresolvable, unrelated type errors.
The upshot is that the foreseeable runtime errors are:
- Attempting to serialize an unregistered type - i.e., loading a class which is unknown to the compiler. This will fail as the
class_registry
instance will throw an exception. - Attempting to de-serialize an instance whose layout has changed in some way. The way in which this mode fails is important - if the layout is 'out-of-order', then the runtime should detect this and
throw()
.
The Unit Tests
These are pretty straightforward and are all contained in the ess_main.cpp source file. The idea is to assemble a test case which will verify (or not) the key implementation expectations. Code obviously has to meet a basic set of requirements in order to compile - and as many errors as can be checked at compile time are checked. However, there are a set of conditions that can only be tested at runtime. The most basic tests are:
- Will a persistent class store itself?
- Will the data generated by persisting a class suffice to create a new instance?
- If the new instance is itself serialized, will the resulting storage be equal to the initial storage (i.e., from 1.)?
- Can the runtime support detecting programming errors such as incorrect derivation?
Fitting ESS: The 42 line guide
Here are the steps required to make any class ESS compliant. All code is contained within the ess
namespace, hence the ess::
qualifier everywhere.
#include "ess_stream.h"
#include "ess_xml.h"
#include "ess_binary.h"
class persistent_base
{
some_type class_member;
public:
ESS_ROOT(persistent_base)
ESS_RTTI(persistent_base,persistent_base)
virtual void serialize(ess::archive_adapter& adapter)
{
ESS_STREAM(adapter,class_member);
}
}
class persistent_derived : public persistent_base
{
some_type class_member;
public:
ESS_RTTI(persistent_derived,persistent_base)
virtual void serialize(ess::archive_adapter& adapter)
{
ESS_STREAM(adapter,class_member);
persistent_base::serialize(adapter);
}
}
That is it. All implementation is in-line, and with the exception of CoCreateGuid()
, runtime support only uses constructs in the std::
namespace, namely std::string
, std::map
, and std::vector
. If you want more detail on how to extend ESS to marshal your own types, then see the ess_class.h file in the ESS_GUI project. It shows how to persist COleDateTime
.
The source code
Both the VS2003 and VS2008 archives contain two folders:
- ./codeproject/ess_code/ess_0X
- ./include/...
Ensure that the paths are created when unzipping as this will mean the projects should build out of the box, without any need to set new #include
paths and the like. Please let me know if anyone finds this not to be the case. The MFC GUI project should unzip in exactly the same manner.
Items for the future
I have intentionally avoided the following issues in the current implementation.
- Support for less frequently used containers,
std::list
and std::stack
for example. My code rarely uses these classes - support is easy to add. - Smart Pointers and friends - these only recently arrived in the TR update for VS2008, and are not found in the standard C++ libraries yet. I was unwilling to roll my own.
- Endian issues in the binary storage system - right now, it is all Intel ordering. It would be nice to use network ordering for binary storage.
Other points arising
- There is a degree of annoying anti-symmetry in the read and write versions of the XML storage. I'd like to smooth that off.
- Efficiency. The upper levels of the XML reader/writer could probably do with some streamlining.
Conclusion
That pretty much wraps it up. I hope I have conclusively demonstrated that type-safe, standards-compliant, and portable persistent C++ code can be created with the minimum of programming effort. A comparison with C# code is interesting. Forsaking the less efficient but automated persistence afforded by Reflection, manually specifying members for serialization using the XML tag has a similar level of in-code overhead to ESS. Any constructive discussion is most welcome, and I would love to get feedback on improvements and beautification.
I have not had the opportunity to try compiling with Visual C++ 6.0 as it is no longer is in use here. I suspect it lacks the templating machinery required for ESS to work. I would love to be proved wrong on this. Also, despite best efforts, I could not persuade the Cygwin bundled GCC to find the right standard header files. My patience with command line tools that don't play nice is limited these days, so the test was binned.
Credits and References
- The XML parser is a stripped down version of the parser contained in DLib. This library contains some interesting things, including one variety of serialization - thanks to the team.
- The BOOST library offers extensive, heavy-weight, C++ serialisation support.
- Templates (1) Modern C++ Design, Andrei Alexandrescu, Addison-Wesley 2002: Amazon.
- Templates (2) C++ Templates, D Vandervoorde, N.M Josuttis, Addison-Wesley 2003: Amazon UK.
- C++ object databases were all the rage in the early 90's, and they all had to solve the marshalling problem. See Gigabase, POET/Versant, and Objectivity to scope contemporary FOSS and commercial offerings.
- Thanks to Johan for the UML editor.
- Thanks to Michal Mecinski for the original tree/grid (www.mimec.org).
I have ruthlessly extended this control to handle arbitrary column counts, multiple-selection, cell addressing, cell colouring, item data, and checkbox support. Any bugs are my own. See view.cpp and ColumnTreeWnd.h for details.
- The canonical URL for ESS updates will be NovaDSP.com.
Footnote: I am in the job market. Please get in touch if you have any interesting opportunities. Thanks.
History
- Version 1.01 - 14 March 2009
- Version 1.02 - 17 March 2009