XML Serialization for C++ Objects






4.17/5 (19 votes)
A framework for serializing C++ objects as XML
Introduction
I have done a fair amount of C# programming. Serialization of .NET objects is pretty straightforward. Actually, it looks straightforward since the grunt work has already been done by the .NET framework itself, as a developer, all we need to do is use these services of the framework. But I love the speed of C++/MFC more so I wanted to create a framework for serializing C++ objects as XML without resorting to .NET.
Background
Well, I have been flirting with the idea of XML serialization for my MFC projects for quite some time. However, I never gave it any serious thought. It all began when an Address Book I have been maintaining for over four years finally failed me. It happened when I was in London and had to lookup one of my friend's email address. Of course, I had my address book application installed on my laptop and I also had a copy of my latest data file. Version mismatch! The data file was from the latest version of the address book and the actual application installed was not the latest version. I realized, if I had used XML for storing the contacts, I would at least be able to read the XML manually to get my work done.
Using the Code
- Serializable.h - defines
ISerializable
,IObjectFactory
,CProperty
classes - Contact.h & Contact.cpp - defines
CAddress
,CContact
classes. Both are serializable. - Serializer.h - defines a
ISerializer
interface so that we might some day have other implementations for serialization - XMLSerializer.h & XMLSerializer.cpp - defines the
CXMLSerializer
class which implements theISerializer
interface - XMLSerialization.h & XMLSerialization.cpp - the
main()
application as such
This example was written using VS.NET 2003. You also need MSXML 4 installed on your system.
For serializing an object, basically one needs to know the following things:
- Which object to serialize
- Where to serialize
- What properties of the object to serialize
Of course, you can't have serialization without de-serialization. So we also need to know:
- Which objects to de-serialize
- How to create the objects
- Which properties to set
Rule 1
One of the most important things lacking in C++ is reflection. So I needed a way to "enquire" about an object and get back its properties and their associated values. For this, I decided that any class which want to make use of the framework for serialization should inherit from the interface ISerializable
.
Rule 2
The next challenge was properties. Classes can have an infinite number of properties each being of any type. For e.g., a CStudent
class may have a property FirstName
. Cool, a CString
. What if the class has a property called CAddress
? Persisting a CString
I can, how do I persist a CAddress
or any other user defined type for that matter. See golden Rule 1. Any class which requires to be serialized should inherit from ISerializable
, so CAddress
should also implement the ISerializable
interface.
Rule 3
Most objects have properties which may be user defined types but ultimately everything finally comes down to the basic data types like string
s, long
s, float
s, etc. However, for XML, it becomes easy to simply allow only string
s. For e.g., the CStudent
class may have an int m_nAge
property, however for the purpose of XML serialization, it exposes this property to the framework as a string
. During de-serialization, the class has the chance to convert the CString
age to an int
age. This bit of conversion is specific to the CStudent
class and the framework knows nothing of it. Any class can directly serialize a CString
property. The same is true
for CStringList
.
Rule 4
Often classes contain not one but a list of objects as one single property. For example, a CAuthor
may have a property m_books
which represents not one book but a list of books. I prefer to use a CPtrList
in cases where multiple objects are being held in a single property. In such cases, the framework allows a CPtrList
containing ISerializable
derived objects to be serialized as a single property.
Rule 5
This rule deals with de-serialization more than serialization. De-serialization consists of creating real objects. Often, creating objects is not as simple as doing a "new
". To isolate the serialization framework from knowing too much about 'how to create the object', I decided to use a factory approach. Any class that wants to (de)-serialize itself should provide a factory class which implements the IObjectFactory
interface. This interface has only two methods. It is possible to implement this interface in the class requiring serialization. For example, I have implemented IObjectFactroy
within CStudent
. If you want you can separate the entity class CStudent
from IObjectFactory
and have an additional class, maybe CStudentFactory
.
Well, the rules may seem a bit daunting but trust me, implementing the ISerializable
interface is really easy and while doing so, I have often designed my classes better than what I would I have done before.
A Real Example
I'll explain the CContact
class which is present in the demo. First, let's take a look at the ISerializable
interface.
#include <afx.h>
#include <afxcoll.h>
//--------------------------------------------------
enum PropertyType
{
Blank,
Simple,
SimpleList,
Complex,
ComplexList
};
//--------------------------------------------------
class CProperty;
class IObjectFactory;
//--------------------------------------------------
class ISerializable
{
public:
virtual ~ISerializable(){};
virtual int GetProperties(CStringList& properties) = 0;
virtual bool GetPropertyValue(const CString& sProperty,
CProperty& sValue) = 0;
virtual bool SetPropertyValue(const CString& sProperty,
CProperty& sValue) = 0;
virtual bool HasMultipleInstances() = 0;
virtual CString GetClassName() = 0;
virtual CString GetID() = 0;
};
//--------------------------------------------------
Let's take a look at the Contact.h file:
class CContact : public ISerializable, public IObjectFactory
{
private:
CString m_sFirstName;
CString m_sId;
CAddress m_address;
CStringList m_emails;
CPtrList m_addresses;
//.... may more stuff below...removed for illustration
We see that the CContact
class wants to be serializable and also implements the IObjectFactory
interface. Now let's see how CContact
implements these functions:
// ISerializable interface
int CContact::GetProperties(CStringList& properties)
{
properties.AddHead(_T("FirstName"));
properties.AddHead(m_address.GetClassName());
properties.AddHead(_T("EmailId"));
properties.AddHead(_T("XAddress"));
return properties.GetCount();
}
//-----------------------------------------------------------
// Used during serlization
bool CContact::GetPropertyValue(const CString&
sProperty, CProperty& property)
{
if(sProperty == _T("FirstName"))
{
property = m_sFirstName;
return true;
}
else if(sProperty == m_address.GetClassName())
{
property = (ISerializable*)&m_address;
property.SetFactory(&m_address); // IMP
return true;
}
else if(sProperty == _T("EmailId"))
{
property = m_emails;
return true;
}
else if(sProperty == _T("XAddress"))
{
property = m_addresses;
property.SetFactory(&m_address); // IMP
return true;
}
return false; // this property does not exist
}
//-----------------------------------------------------------
// Used during de-serialization
bool CContact::SetPropertyValue(const CString& sProperty,
CProperty& property)
{
if(sProperty == _T("FirstName"))
{
m_sFirstName = property;
return true;
}
else if(sProperty == _T("ID"))
{
m_sId = property;
return true;
}
else if(sProperty == m_address.GetClassName())
{
CAddress* address = (CAddress*)(property.GetObject());
m_address.SetCity(address->GetCity());
// delete the passed in object if we don't need it
property.GetFactory()->Destroy(address);
return true;
}
else if(sProperty == _T("EmailId"))
{
CProperty::CopyStringList(m_emails, property.GetStringList());
return true;
}
else if(sProperty == _T("XAddress"))
{
// first free any existing objects
POSITION pos = m_addresses.GetHeadPosition();
while(pos)
{
CAddress* pAddress = (CAddress*)m_addresses.GetNext(pos);
delete pAddress;
}
CProperty::CopyPtrList(m_addresses, property.GetObjectList());
return true;
}
return false; // this property does not exist
}
//--------------------------------------------------------------
bool CContact::HasMultipleInstances()
{
return true; // we will have more than one contact instance
}
//--------------------------------------------------------------
CString CContact::GetClassName()
{
return _T("Contact");
}
//--------------------------------------------------------------
CString CContact::GetID()
{
return m_sId;
}
//--------------------------------------------------------------
// IObjectFactory Interface
ISerializable* CContact::Create()
{
return new CContact();
}
//--------------------------------------------------------------
void CContact::Destroy(ISerializable* obj)
{
delete obj;
}
//--------------------------------------------------------------
Another important class is the CProperty
class which acts as a wrapper over a class' property. This class is used only by the serialization framework but you may find this class useful in other situations as well.
Important (ISerializable) Methods
GetProperties(CStringList& properties)
This method is invoked on the entity class (
CContact
) by the framework. The method should simply add the names of the properties which will be serialized.GetPropertyValue(const CString& sProperty, CProperty& property)
This method is invoked on the entity class by the framework to find out the value of a property. E.g.:
if(sProperty == _T("FirstName")) { property = m_sFirstName; return true; } else { return false; }
The first parameter tells us which property the framework is asking for. If the property is "
FirstName
", then we store the value of first name in the property (second parameter). It is important that we return atrue
if the property name matched. In case the framework asks for a property which we do not support, we return afalse
. This should never really happen since the class (CContact
) itself tells the list of properties it supports in theGetProperties
method.SetPropertyValue(const CString& sProperty, CProperty& property)
This method is invoked on the entity class by the framework during de-serialization. After creating a new object, the framework has to apply the property values. To do so, it invokes this method:
SetPropertyValue(const CString& sProperty, CProperty& property) { if(sProperty == _T("FirstName")) { m_sFirstName = property; return true; } } else if(sProperty == m_address.GetClassName()) { CAddress* address = (CAddress*)(property.GetObject()); m_address.SetCity(address->GetCity()); // delete the passed in object if we don't need it property.GetFactory()->Destroy(address); return true; }
The framework passes in the property name as the first parameter and the actual object in the second parameter. Here, we see how the first name is being stored. For properties which are complex types (i.e., user defined objects or UDFs), we are passed in a pointer to the actual de-serialized object. You may want to hold on to this object or delete it. Here, we see how the "
Address
" property is being treated. We are making a copy of aCAddress
object and deleting the passed in object. If we wanted, we could hold on to this object and use it as we see fit. Memory management of the passed in object is not the responsibility of the serialization framework.HasMultipleInstances()
This method should return a
true
if multiple instances of the class will be persisted. If not, the method should return afalse
. In our case, we want to serialize many instances ofCContact
, therefore we return atrue
.GetClassName()
This method should return a name for the class. This is usually not a problem in single applications because these class names will not clash. However, you may want to use names which are GUIDs instead of friendly names like "
Contact
" as we have done in the example.GetID()
This method is used to associate an
ID string
value with a class. This is not really used by the framework but has been added for future use. In case you return a non-emptystring
after serialization, you will see something like<contact id="001">
in the XML file. If you return an emptystring
, then you will see only<contact>
in the XML file.
Factory Methods
Create()
This method should create a new instance of the class and return the newly created object. See Rule 2. E.g.:
ISerializable* CContact::Create() { // write any extra stuff but ultimately return an object return new CContact(); }
Destroy
This method is responsible for deleting an object. E.g.:
void CContact::Destroy(ISerializable* obj) { delete obj; }
Serializing an Object
In the file XMLSerialization.cpp, we have the Serialize
method which simply creates a CContact
object and serializes it to c:\temp\contacts.xml file.
Note how the CXMLSerializer
class is created. The first argument is the file name, the seconds is the name of the application, and the third arguments specifies if the XML files needs to be read (for serialization this should be false and for de-serialization this should be true
).
void Serialize()
{
CXMLSerializer ser(_T("c:\\temp\\contacts.xml"),
_T("TestApp"), false); // IMP: true for deserialization
CContact contact;
contact.SetFirstName(_T("Meena"));
contact.GetAddress()->SetCity(_T("Paris"));
contact.GetEmailIds()->AddHead(_T("meena@hotmail.com"));
contact.GetEmailIds()->AddHead(_T("meena@yahoo.com"));
ser.Serialize(&contact);
}
De-Serializing an Object
In the file XMLSerialization.cpp, we have the Deserialize
method. It creates the 'ser
' object. We see that an instance of CContact
is created here. This is because the IObectFactory
interface is implemented by CContact
, since we need the factory, we create an instance of it. The Deserialize
method of the CXMLSerializer
expects a CPtrList
object as the second parameter. All object(s) created as a result of de-serialization are stored in this CPtrList
object.
void Deserialize()
{
CXMLSerializer ser(_T("c:\\temp\\contacts.xml"), _T("TestApp"), true);
// IMP: 'true' for deserialization
CContact contact;
CPtrList objects;
int nObjects;
nObjects = ser.Deserialize(&contact, objects);
for(int n = 0; n < nObjects; n++)
{
CContact* obj = (CContact*)objects.GetAt(objects.FindIndex(n));
_tprintf(obj->GetFirstName());
_tprintf(_T("\n"));
_tprintf(obj->GetAddress()->GetCity());
_tprintf(_T("\n"));
POSITION pos = obj->GetEmailIds()->GetHeadPosition();
CString sEmail;
while(pos)
{
sEmail = obj->GetEmailIds()->GetNext(pos);
_tprintf(sEmail);
_tprintf(_T("\n"));
}
pos = obj->GetAddressList()->GetHeadPosition();
while(pos)
{
CAddress* address =
(CAddress*)obj->GetAddressList()->GetNext(pos);
_tprintf(address->GetCity());
}
delete obj;
}
}
We have a CString
first-name property, a CStringList
email IDs property and a CPtrList
CAddress
property.
This demonstrates how we can have complex objects (classes implementing ISerializable
), a list of complex objects, a list of string
s (CStringList
) properties handled.
To run the demo application, go to the bin folder in the command prompt.
First, we need to serialize the CContact
class. Type in c:\demo\bin>XMLSerialization.exe -S c:\contacts.xml. This will create a file called contacts.xml in c:\.
To de-serialize, type in: c:\demo\bin>XMLSerialization -D c:\contacts.xml.
History
This framework makes serializing C++ objects as XML a snap. All you need to do is follow a couple of rules and you are on your way. Based on feedback, I would like to support a few more collection classes like CMap
s and CArray
s.
Please leave your comments below. I'd love to hear from you.