Click here to Skip to main content
Click here to Skip to main content

A serialization primer - Part 2

By , 17 Feb 2002
 

This article is the second of a 3 part tutorial on serialization.

  • Part 1 introduces the basics of serialization.
  • Part 2 explains how to gracefully handle reading invalid data stores and support versioning.
  • Part 3 describes how to serialize complex objects.

In Part 1 we saw how to serialize a simple object via a CArchive using a serialize() method like this:

  int CFoo::serialize
    (CArchive* pArchive)
  {
    int nStatus = SUCCESS;

    // Serialize the object ...
    ASSERT (pArchive != NULL);
    TRY
    {
      if (pArchive->IsStoring()) {
         // Write employee name and id
         (*pArchive) << m_strName;
         (*pArchive) << m_nId;
      }
      else {
         // Read employee name and id
         (*pArchive) >> m_strName;
         (*pArchive) >> m_nId;
      }
    }
    CATCH_ALL (pException)
    {
      nStatus = ERROR;
    }
    END_CATCH_ALL

    return (nStatus);
  }
There's a problem with this code. What if we mistakenly read a datafile that doesn't contain the expected information? If the datafile doesn't contain a CString followed by an int, our serialize() method would return ERROR. That's nice, but it would be better if we could recognize the situation and return a more specific status code like INVALID_DATAFILE. We can check that we're reading a valid datafile (i.e. one that contains a CFoo object) by using an object signature.

Object signatures

An object signature is just a character string (eg: "FooObject") that identifies an object. We add a signature to CFoo by modifying the class definition:
  class CFoo
  {
    ...

    // Methods
    public:
      ...
      CString getSignature();

    // Data members
      ...
    protected:
      static const CString  Signature;  // object signature
  };
The signature is declared in Foo.cpp.
  // Static constants
  const CString CFoo::Signature = "FooObject";
Next, we modify the serialize() method to serialize the signature before serializing the object's data members. If an invalid signature is encountered, or if the signature is missing, it's likely that we're attempting to read a data store that doesn't contain a CFoo object. Here's the logic for reading a signed object:

Using a signature to validate a data store

And here's the code:

  int CFoo::serialize
    (CArchive* pArchive)
  {
    int nStatus = SUCCESS;
    bool bSignatureRead = false;

    // Serialize the object ...
    ASSERT (pArchive != NULL);
    TRY
    {
      if (pArchive->IsStoring()) {
         // Write signature
         (*pArchive) << getSignature();

         // Write employee name and id
         (*pArchive) << m_strName;
         (*pArchive) << m_nId;
      }
      else {
         // Read signature - complain if invalid
         CString strSignature;
         (*pArchive) >> strSignature;
         bSignatureRead = true;
         if (strSignature.Compare (getSignature()) != 0) {
            return (INVALID_DATAFILE);
         }

         // Read employee name and id
         (*pArchive) >> m_strName;
         (*pArchive) >> m_nId;
      }
    }
    CATCH_ALL (pException)
    {
      nStatus = bSignatureRead ? ERROR : INVALID_DATAFILE;
    }
    END_CATCH_ALL

    return (nStatus);
  }
You should ensure that all your objects have unique signatures. It's less important what the actual signature is. If you're developing a suite of products, it's helpful to have a process for registering object signatures companywide. That way, developers won't mistakenly use the same signature for different objects. If you want to make it harder to reverse engineer your datafiles, you should use signatures that have no obvious connection to object names.

Versioning

As you upgrade your product during its lifecycle, you may find it necessary to modify the structure of CFoo by adding or removing data members. If you simply released a new version of CFoo, attempts to read old versions of the object from a data store would fail. This is obviously not acceptable. Any version of CFoo should be able to restore itself from an older serialized version. In other words, CFoo's serialization method should always be backward compatible. This is easily accomplished by versioning the object. Just as we added an object signature, we add an integer constant that specifies the object's version number.
  class CFoo
  {
    ...

    // Methods
    public:
      ...
      CString getSignature();
      int     getVersion();

    // Data members
      ...
    protected:
      static const CString  Signature;  // object signature
      static const int      Version;    // object version
  };
The object's version is declared in Foo.cpp.
  // Static constants
  const CString CFoo::Signature = "FooObject";
  const int     CFoo::Version = 1;
Next, we modify the serialize() method to serialize the version after serializing the signature, and before serializing the object's data members. If a newer version is encountered, we're attempting to read an unsupported version of the object. In this case, we simply return the status UNSUPPORTED_VERSION.
  int CFoo::serialize
    (CArchive* pArchive)
  {
    int nStatus = SUCCESS;
    bool bSignatureRead = false;
    bool bVersionRead = false;

    // Serialize the object ...
    ASSERT (pArchive != NULL);
    TRY
    {
      if (pArchive->IsStoring()) {
         // Write signature and version
         (*pArchive) << getSignature();
         (*pArchive) << getVersion();

         // Write employee name and id
         (*pArchive) << m_strName;
         (*pArchive) << m_nId;
      }
      else {
         // Read signature - complain if invalid
         CString strSignature;
         (*pArchive) >> strSignature;
         bSignatureRead = true;
         if (strSignature.Compare (getSignature()) != 0) {
            return (INVALID_DATAFILE);
         }

         // Read version - complain if unsupported
         int nVersion;
         (*pArchive) >> nVersion;
         bVersionRead = true;
         if (nVersion > getVersion()) {
            return (UNSUPPORTED_VERSION);
         }

         // Read employee name and id
         (*pArchive) >> m_strName;
         (*pArchive) >> m_nId;
      }
    }
    CATCH_ALL (pException)
    {
      nStatus = bSignatureRead && bVersionRead ? ERROR : INVALID_DATAFILE;
    }
    END_CATCH_ALL

    return (nStatus);
  }
Version 1 of our CFoo contained 2 data members - a CString (m_strName) and an int (m_nId). If we add a third member (eg: int m_nDept) in version 2, we need to decide what m_nDept should be initialized to when reading an older version of the object. In this example, we'll initialize m_nDept to -1 implying that the employee's department code is "Unknown".
  class CFoo
  {
    ...
    // Data members
    public:
      CString  m_strName;  // employee name
      int      m_nId;      // employee id
      int      m_nDept;    // department code (-1 = unknown)
  };
We also need to increase the object's version number in Foo.cpp to 2.
  const int CFoo::Version = 2;
Finally, we modify the part of serialize() that reads the object so that m_nDept is initialized to -1 if we're reading an older version of the datafile. Note that the file is always saved as the latest version.
  int CFoo::serialize
    (CArchive* pArchive)
  {
    ...
    // Serialize the object ...
    ASSERT (pArchive != NULL);
    TRY
    {
      if (pArchive->IsStoring()) {
         ...
         // Write employee name, id and department code
         (*pArchive) << m_strName;
         (*pArchive) << m_nId;
         (*pArchive) << m_nDept;
      }
      else {
         ...
         // Read employee name and id
         (*pArchive) >> m_strName;
         (*pArchive) >> m_nId;

         // Read department code (new in version 2)
         if (nVersion >= 2) {
            (*pArchive) >> m_nDept;
         }
         else {
            m_nDept = -1; // unknown
         }
      }
    }
    CATCH_ALL (pException)
    {
      nStatus = bSignatureRead && bVersionRead ? ERROR : INVALID_DATAFILE;
    }
    END_CATCH_ALL

    return (nStatus);
  }

Conclusion

So far, we've dealt with providing robust support for serializing simple objects - i.e. that those that contain readily serializable data types. In Part 3, we'll see how to serialize any kind of object.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ravi Bhavnani
Technical Lead
Canada Canada
Member
Ravi Bhavnani is an ardent fan of Microsoft technologies who loves building Windows apps, especially PIMs, system utilities, and things that go bump on the Internet. During his career, Ravi has developed expert systems, desktop imaging apps, marketing automation software, EDA tools, a platform to help people find, analyze and understand information, trading software for institutional investors and advanced data visualization solutions. He currently works for a company that provides enterprise workforce management solutions to large clients.
 
His interests include the .NET framework, reasoning systems, financial analysis and algorithmic trading, NLP, CHI and UI design. Ravi holds a BS in Physics and Math and an MS in Computer Science and was a Microsoft MVP (C++ and C# in 2006 and 2007). He is also the co-inventor of 2 patents on software security and generating data visualization dashboards. His claim to fame is that he crafted CodeProject's "joke" forum post icon.
 
Ravi's biggest fear is that one day he might actually get a life, although the chances of that happening seem extremely remote.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Questionwhere and when operator>> is called??memberxxq3121716 Apr '07 - 4:15 
Many articles tell me that when i want to make a class serializable in VC 6.0 MFC projects,i need to:
1.Deriving your class from CObject (or from some class derived from CObject ).
2.Overriding the Serialize member function.
3.Using the DECLARE_SERIAL macro in the class declaration.
4.Defining a constructor that takes no arguments.
5.Using the IMPLEMENT_SERIAL macro in the implement files.
I know the DECLARE_SERIAL macro declare an operator in source files.
#define IMPLEMENT_SERIAL(class_name,base_class_name,wSchema)\
.............................
CArchive& AFXAPI operator>>(CArchive& ar,class_name* &pOb)\
{ pOb=(class_name*)ar.ReadObject(RUNTIME_CLASS(class_name));
return ar;
}
I really do not know when and where operator>>(CArchive& ar,class_name* &pOb) is called.
I find when reading an object from a persistent storage medium,_AFX_INLINE CCArchive& AFXAPI operator>>(CArchive& ar,CObject*& pOb){pOb=ar.ReadObject(NULL);return ar;} is called not operator>>(CArchive& ar,class_name* &pOb).
Can you give me some examples ?
My english is poor,but i really want get your help!!
Thanks!
AnswerRe: where and when operator>> is called??memberRavi Bhavnani16 Apr '07 - 7:45 
See this[^] MSDN link for a step-by-step overview of MFC serialization.
 
My tutorials describe how to perform serialization without having to derive from CObject or having to use the DECLARE/IMPLEMENT_SERIAL macros. While the concepts are similar, I prefer to use my methodology because it's easier to use and has no hidden or automagical behavior.
 
/ravi
 
PS: Your English is fine! Smile | :)
 
This is your brain on Celcius
Home | Music | Articles | Freeware | Trips
ravib(at)ravib(dot)com

GeneralReading a file using serializationsusshemanth_phk9 Sep '04 - 11:56 
I am trying to read in a file, which was written using serialization (by someone else). The size of the variables present in the file from which I need to read is way greater than the stack size. So this is giving me problems like stack overflow. Is there any way to solve such problems? Please give me your suggestions .Thanks a lot
GeneralRe: Reading a file using serializationmemberRavi Bhavnani9 Sep '04 - 12:27 
See my reply in Part 3. No need to cross-post. Smile | :)
 
/ravi
 
My new year's resolution: 2048 x 1536
Home | Articles | Freeware | Music
ravib@ravib.com

GeneralI can't access the objectsmemberTorrejon12 May '04 - 3:18 
Congratulation for this fantastic tutorial.
I have tried to write objects in the file and make it without any problem. The great problem is that next I am not capable of reading more than the first object and not itself advance more. I have tried of everything: opening it as a binary file (thanks to that it could know up to where to iterate), but I do not manage to access the following object. It would be very thankful for you if you could help me. Insurance that is a silly thing but in these moments for me it has one big importance.
Many Graces Smile | :)
GeneralRe: I can't access the objectsmemberRavi Bhavnani12 May '04 - 3:25 
Thanks. Blush | :O
 
Have you read part 3 (the section that deals with saving/loading a collection of objects)? That may offer a clue. If you're still having problems, please post a small code fragment that deals with serializing your collection of objects.
 
/ravi
 
My new year's resolution: 2048 x 1536
Home | Articles | Freeware | Music
ravib@ravib.com

GeneralRe: I can't access the objectsmemberTorrejon12 May '04 - 23:04 
Graces for so soon answering.. The problem is not the class that I store (since this is normal) if not the way of afterwards to iterate to be able to recover these elements. The pointer of the file does not advance me and I always recover the same object of those that I already have stored.
Many graces
GeneralRe: I can't access the objectsmemberTorrejon13 May '04 - 0:20 
I have already obtained the solution. The problem was that it did not return the pointer of the updated PArchive therefore it always accessed the same object.
Many Graces
GeneralRe: I can't access the objectsmemberRavi Bhavnani13 May '04 - 1:01 
Glad you got your code to work!
 
/ravi
 
My new year's resolution: 2048 x 1536
Home | Articles | Freeware | Music
ravib@ravib.com

Generalwhy manually?? it's build in!memberAnonymous26 Feb '02 - 4:03 
Why are you versioning manually? It's build right in to VC++ (look up VERSIONABLE_SCHEMA). One warning though: it doesn't work on your document class itself, although MS forgets to mention that...
GeneralRe: why manually?? it's build in!memberRavi Bhavnani26 Feb '02 - 4:22 
The serialization scheme described here is independent of MS macros. It happens to use MFC classes like CArchive and CFile but can (and has) easily be ported to other platforms.
 
The intent was to explain serialization as an independently implementable feature, and show that it can be used for any object.
 
/ravi
 
"There is always one more bug..."
http://www.ravib.com
ravib@ravib.com

GeneralRe: why manually?? it's build in!memberdazinith2 Apr '02 - 4:45 
is using nVersion and what you have discussed a viable option in CDocument derived classes? i see that CDocument doesn't have the call:
IMPLEMENT_SERIAL (CMyObject, CObject, VERSIONABLE_SCHEMA|1)
 
so im looking for another way to implement versions in my document and this looks like what i need, is this correct?
 
-dz
GeneralRe: why manually?? it's build in!memberRavi Bhavnani2 Apr '02 - 5:35 
Yes. The serialization scheme described here can be used to serialize any kind of object. Remember, serialization is neither magical nor Microsoft-ish. It's just a way to safely save and restore data to and from persistent storage.
 
/ravi
 
"There is always one more bug..."
http://www.ravib.com
ravib@ravib.com

GeneralRe: why manually?? it's build in!memberluedi22 Nov '02 - 6:05 
Hi,
 
your comment made me look deeper in the serialization process provided by MFC. In the first moment I though, great, VERSIONABLE_SCHEMA is exactly what I needed. But after a closer look, I came to the conclusion, that this is a nice feature, but that it does not help in more complex situation apart from simple object serialization.
 
I found a wonderful explanation of the problems I focused when using the this feature at
http://archive.devx.com/free/mgznarch/vcdj/1997/nov97/serial5.asp
 
The point is, that the versioning provided by MFC doesn't work in object hierarchies with independent versioning and second, you can not use this feature when you call CMyObject::Serialize directly.
 
Dirk

GeneralRe: why manually?? it's build in!memberZac Howland3 Feb '03 - 6:54 
luedi wrote:
I found a wonderful explanation of the problems I focused when using the this feature at
http://archive.devx.com/free/mgznarch/vcdj/1997/nov97/serial5.asp
 
The point is, that the versioning provided by MFC doesn't work in object hierarchies with independent versioning and second, you can not use this feature when you call CMyObject::Serialize directly.

 
I could be wrong, but I believe the issue mentioned in that article was fixed in VC++ 6. As for not being able to call Serialize directly, that is documented and has to do with process of reading/writing an object. If you serialize an object via
 
myObject.Serialize(ar)
 
it only performs the operations in that function. However, if you serialize via
 
ar >> pmyObject;
 
the WriteObject function is called that writes the schema for the object in the file before calling Serialize.
 
Zac
 
"If I create everything new, why would I want to delete anything?"
GeneralRe: why manually?? it's build in!memberRudolf Jan Heijink26 Apr '04 - 9:33 
The versionable schema doesn't work properly in VC6.0. Forget about using it. Using your own Schema variable is a much better solution, but you should use it for the first version of the class as well.
 
For the rest, I stick to to the standard VC Serialization function.It works reasoably weell and I don't see any use in rewriting the MFC functions and macros yourself.
GeneralDangers of serializationmemberNavin25 Feb '02 - 9:11 
This is a good article, and explains well how serialization works. Perhaps what I'm posting is outside the scope of the article (more of a design decision rather than a technical how-to), but it is something I have run into when doing serialization of complex objects. While this does allow a nice, neat object-oriented way to save/restore objects, it can be inflexible. As you have shown, you can add/remove members from a class and use a form or versioning to allow backwards compatiblity. But there are some things that are difficult or impossible:
 
Green Alien | [Alien] Forward compatiblity. Many people may not care, but serializing in this way essentially changes your file format every time the objects change.
Green Alien | [Alien] You become locked into an object structure. Changing or adding members here and there works, but what if something major happens - e.g., a base class changes? Or you find you have two classes that didn't derive from a base class before, but now you want to break out a base class?
 
Perhaps a part 4 to this series, should you use serialization or not? Just my two cents...

 
The early bird may get the worm, but the second mouse gets the cheese.
GeneralRe: Dangers of serializationmemberRavi Bhavnani25 Feb '02 - 10:04 
Navin wrote:
Many people may not care, but serializing in this way essentially changes your file format every time the objects change.
 
Yes. The serialization scheme presented in this series asserts that a properly serialized object will always be stored (according to the most current schema). This is expected behavior.
 
Navin wrote:
what if something major happens - e.g., a base class changes?
 
This is gracefully handled, as shown in Part 3 of the tutorial. Part 3 is crying out for source code! I'll update the article in a few weeks.
 
Imho, I don't see a danger to structured serialization. If this is not what you want to do, then an alternative approach (eg: saving fragmented information an in .INI file) may be employed. But if you want to save/restore data without having to worry about backward compatibility of objects, then serialization (as presented here) should work fine.
 
Aside: This scheme has been used successfully in products (whose class structures have evolved over time) that are distributed in very large quantities (millions). The scheme is very robust (i.e. almost 100% of run-time read/write errors have been caught and signalled by the app).
 
/ravi
 
"There is always one more bug..."
http://www.ravib.com
ravib@ravib.com

GeneralRe: Dangers of serializationmemberNavin26 Feb '02 - 6:42 
Ravi Bhavnani wrote:
what if something major happens - e.g., a base class changes?
 
This is gracefully handled, as shown in Part 3 of the tutorial. Part 3 is crying out for source code! I'll update the article in a few weeks.

 
The biggest problem is, IMHO, backwards compatibility issues.
 
I may not have gotten my point across with this one. I am talking about this case:
1. You design a class... say, CFoo. It has a serialize function. All is good.
2. Later on, you design a class, CBar and do its serialization.
3. Somebody discovers that CFoo and CBar have a lot of common functionality, and that a bast class, CBaz, shold be created for both.
 
It seems this cannot be done in a graceful manner and still support backwards compatibility. Once you start serializing the CBaz as part of CFoo and CBar, the whole schema is broken. You have to do something odd like serialize CBaz's members from the CFoo and CBar classes, which breaks encapsulation. More complicated scenarios would be more difficult to solve.
 
I guess my point is that serialization works and is robust, but if you care about backwards and forwards file compatibility, it can be fairly inflexible since file formats are tied to your object structure.
 
You are right in that it will work great if compatibility with previous/future versions is not a problem. But alas, some of us don't have that luxury... Frown | :(

 
The early bird may get the worm, but the second mouse gets the cheese.
GeneralRe: Dangers of serializationmemberRavi Bhavnani26 Feb '02 - 7:45 
You're right. A change of this magnitude will not be automagically supported. There are many workarounds, one of which is defining new objects CFoo2 and CBar2.
 
When you said a base class changed, I thought you meant the composition of CFoo's base class may have changed. These kinds of mods are supported, but I'm sure you knew that!
 
/ravi
 
"There is always one more bug..."
http://www.ravib.com
ravib@ravib.com

GeneralForward file compatibilitymemberRavi Bhavnani26 Feb '02 - 7:47 
Navin wrote:
forwards file compatibility
 
Yes, only backward compatibility is supported. I don't know of any apps that can read new versions of data stores, although this can be done by using dynamic schemas (eg: XML).
 
/ravi
 
"There is always one more bug..."
http://www.ravib.com
ravib@ravib.com

GeneralRe: Forward file compatibilitymemberTim Smith26 Feb '02 - 8:06 
Our product does this just fine.
 
Of course, there is no magic bullet. Even though build 90 of OmniServer might be able to read a build 98 data file, obviously any data pertaining to new features would be ignored.
 
Here is how our system works in general.
 
Each serialization stream generated by an object is encapsulated with a type identifier, a version, and a stream length. When the data is de-serialized (is that even a word?), the encapsulated serialization buffer is passed to the object. Since it is a requirement that any new data be added at the end of a serialization buffer, an older version of the product might only read part of the buffer being de-serialized. Thus, after the object is finished with the serialization buffer, the routine performing the de-serialization of the entire stream just uses the length of the encapsulated serialization buffer to locate the start of the next buffer. Thus, older versions can read newer data files.
 
But of course, there are SIGNIFICANT limitations. When we did V2 of our product, we decided to rework the serialization buffers and didn't support V1 reading V2 data files.
 
Tim Smith
 
I know what you're thinking punk, you're thinking did he spell check this document? Well, to tell you the truth I kinda forgot myself in all this excitement. But being this here's CodeProject, the most powerful forums in the world and would blow your head clean off, you've got to ask yourself one question, Do I feel lucky? Well do ya punk?
GeneralRe: Forward file compatibilitymemberRavi Bhavnani26 Feb '02 - 8:15 
Yep, that's a dynamic schema like XML, which will only read what it can and safely ignore the rest.
 
Tim Smith wrote:
de-serialized (is that even a word?),
Sure is!
 
/ravi
 
"There is always one more bug..."
http://www.ravib.com
ravib@ravib.com
 

GeneralRe: Forward file compatibilitymemberTim Smith26 Feb '02 - 8:21 
Damn, we should have patented it way back in 1994. Then we would sue and collect money from everyone using XML!!!!
 
RIIIIIIIIIIIIIIGHT.....
 
Tim Smith
 
I know what you're thinking punk, you're thinking did he spell check this document? Well, to tell you the truth I kinda forgot myself in all this excitement. But being this here's CodeProject, the most powerful forums in the world and would blow your head clean off, you've got to ask yourself one question, Do I feel lucky? Well do ya punk?
GeneralRe: Forward file compatibilitymembercompiler2 Dec '02 - 7:31 
How do you store object references in such a scheme? Obviously, you can't just in-line the referenced object like MFC serialization does the first time a reference is seen. If you did, you'd be intermingling data from multiple objects. So, you either store a reference ID and add the object to a queue, or use a multi-pass serialization technique. In either case your data format would not end up being very nice for streaming over a thin pipe or progressive/partial loading.
 
My point is that there are tradeoffs in all approaches (as someone mentioned, "no silver bullets") and the greatest strength of the one described in the article is simplicity. Simple is good. Personally, I like a multi-pass approach since it provides an opportunity for all kinds of interesting optimizations.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 17 Feb 2002
Article Copyright 2002 by Ravi Bhavnani
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid