Some applications have really long computation cycles. Examples of these include:
Typically, completing a computation cycle for some analyses can take weeks or even months. In such situations, some of us have to cope with two different problems: First, not enough computational resources (read PCs) to dedicate to the long calculations, so one has to use what one has! Second, one needs to interrupt the work to perform a host of necessary tasks (such as work a large spreadsheet or download a movie!).
If you stop your long computation cycle midway to "free-up" the PC, you lose all the intermediate results. Alternatively, you have to enhance your application/class to explicitly and carefully save the "state" of the class (i.e. intermediate results) before shutting down and then restore the "state" of the class with equal care when you restart your application. That can amount to a lot of thankless error-prone work!
.NET 2.0 offers a very handy workaround for such situations. The sample code in this article illustrates one way to solve this problem without having to write a lot of your own code to save and restore the "state" of a class -- with every member variable containing exactly the same value it had when you interrupted your work.
To quote Jeffrey Richter "Serialization is the process of converting an object [or a connected graph of objects] into a contiguous stream of bytes." Simply put, if you have a class called
LongCalc and it is marked as
[Serializable] you can save the state of its instance, say
mLongCalc, in a file using just a few lines of code.
FileStream stream = new FileStream(msFile, FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
There are a lot of articles written on serialization on The Code Project and elsewhere (courtesy Google); you can also find many passionately argued pros and cons on the topic. So it makes little sense for me to repeat any of them here.
What is particularly nice about serializing the instance of a class this way is you don't have to worry about having to explicitly, exhaustively, correctly, and carefully identify and save every member variable. The
BinaryFormatter does that for you, automatically!
To quote Jeffrey Richter again "Deserialization is the process of converting a contiguous stream of bytes back into its graph of connected objects." Again, in simpler terms, you can restore the state of
mLongCalc, an instance of a class called
LongCalc, from a file using just a few lines of code.
FileStream stream = new FileStream(msFile, FileMode.Open);
BinaryFormatter formatter = new BinaryFormatter();
mLongCalc = (LongCalc)formatter.Deserialize(stream);
Again, you don't have to worry about having to explicitly, exhaustively, correctly, and carefully identify and restore every member variable of the class. The
BinaryFormatter does that for you, too!
The Sample Application
The sample application uses a simple form (shown in the picture above). It has two classes worth mentioning:
LongCalc which has only one method,
DoOneCalculationCycle(). As this is an illustrative example, it merely increments a member variable called
mCalcsCompleted a million times. (In my application, it called another class to perform some CPU-intensive calculations in each iteration taking, on average, about 10 hours to complete a million iterations; but that is not materially relevant to this article.)
CalcManager which does the serialization using a method called
SaveState() and deserialization using a method called
RestoreState(). These two methods wrap the above code fragments in
catch blocks. After
LongCalc completes a specific number of iterations, it stops, and
CalcManager serializes the data and displays the status, i.e. number of iterations completed.
To keep things clean and simple, I've removed all inapplicable code and, at the risk of overstating the obvious, added in-line comments.
How To Test the Application
If You've Downloaded the Source Code
Open the application in Visual Studio 2005 and run it. As it will not find a serialized state file called SerializedClass.bin (which it expects to sit in the same folder as the executable), the application will enable the Start Long Calc button. After it completes a million iterations,
- it will stop and display the status in a
- it will disable the Start Long Calc button
- it will enable the Resume Long Calc button
At this point, you can either click the Resume button, each time "completing" another million iterations, or you can exit.
When you restart the application -- immediately, tomorrow, or even on another PC (provided you transfer the SerializedClass.bin file along with the EXE) -- it will display the status (number of iterations completed) and let you click the Resume button as often as you want, each time "completing" another million iterations until you've completed a billion iterations.
If You've Downloaded the Demo Executable (and the Serialized State File)
IMPORTANT: Both files must sit in the same folder. When you start the EXE, it will display the status (number of iterations completed) and let you click the Resume button as often as you want, each time "completing" another million iterations until you've completed a billion iterations. You can quit and restart the application as often as you wish each time continuing from where you left off.
If you want to start from scratch, simply delete the SerializedClass.bin file.
How I Used the Application
During the day, my PC at home is idle and during the night, my PC at work is idle. In my low budget version of the protein folding project at Stanford, before going home from work, I'd download the serialized state file for my project from my home computer and resume computations on my work computer. It would work through the night, complete its cycle and serialize the state file. On returning to work the next day, I'd upload the serialized state file to my home computer, and have it resume the computations. It took about a month and half to complete the analysis. The best part of it was, however, I was able to use the PCs at home and at work normally and get the computations completed without spending an extra dime.
I undertook this project to learn a few things about Visual Studio 2005, C#, serialization, distributed processing, etc. Caveat Emptor: Nothing has been properly or adequately tested. More important, there is a good chance, you or someone else can do this better. So, if you do use this code and cannot produce what you expect, there may be little I can do to help you. On the bright side, you have the source code and technical reference material (though not easy to follow) from Microsoft. Also, I don't know VB.NET, but the C# code is simple enough that you might be able to figure it out without much trouble.
How You Might Use the Application
Please read the above DISCLAIMER first. If you have a similar problem, you should be able to simply:
- Replace the references to
LongCalc with your class
- Mark that class as
- Replace all references
mLongCalc with the instance of your class
Things To Watch Out For
- If your class is complicated by non-serializable objects, events handlers, etc., you may have a lot of work ahead of you. (I've only come across discussions on these, but didn't have to worry about them as I had a pure computational problem to solve.)
- You must not rename anything (i.e. names of the class, member variables, methods, etc.) in your class after you've serialized it. If you do, not only will you get unhelpful exception messages, but your serialized state file may well be, for all practical purposes, unusable.
The ability to use serialization to suspend and deserialization to resume workflow-related applications may have a myriad applications. For instance, you have an electronic tax form, and half way through you have to stop (and relinquish your computer) and start the next day where you stopped. Or, you have an electronic joint credit application form and you need to send it to your partner to have him/her fill in certain fields and send the form back to you or someone else. Or, you have received an electronic purchase order that covers items to be supplied by your company and two fellow vendors. You update relevant parts of the form and then forward it to one of the others. Or, you are collecting data in the field, and the person you're interviewing has to unexpectedly stop midway. You can suspend the incomplete form and resume later.
All of these are possible if each form has the ability to save and restore its "incomplete" state explicitly. If they don't, adding the capability the "old-fashioned way" is hard work. Indeed, I suspect, that is why Microsoft added the _VIEWSTATE capabilities to ASP.NET -- which uses a form of serialization to make it quite easy for a developer to turn a "stateless" Web page into a "stateful" one. Adding support for on-demand serialization/deserialization (equivalents of File > Save and File > Open) may be somewhat easier than writing/reading all the member variables' values explicitly. Of course, this too can pose problems, but then ... therein lies the challenge!
Other Recent Contributions
If Any of These Have Helped You ...
Please consider sending in a donation to one of my favorite causes: Year Up or to any similar NGO (non-governmental organization) in the world that is selflessly doing good work and helping people in need. Give a little, get a lot!