Original Article has been moved
Each of these items could be a blog post unto themselves, but I am going to try really hard to not be too verbose and just cover the core of the concept and why you need to do it.
Data driven applications rely on their databases
Everyone knows that any app driven by data is much more than just the app. In most cases the app without a database doesn’t even function, or fails to function properly. If a database is an integral part of your application, then shouldn’t you be doing all you can to ensure it stays healthy and prepare for the worst case events of corruption or dead drives?
This week we have been contacted by 5 long time users who suddenly lost or corrupted their data. Two of these were end user data that is going to cost a huge amount of labor to reproduce. In all of these cases the following simple procedures would have prevented the situation entirely.
1 – Backup your databases
This seems pretty obvious, right? The database is just a file, xcopy it and make a backup. Enforce backups within your app, make it an integral part of the application. If a user accidentally deletes a huge part of their data, how can you recover it? With a backup it is pretty easy.
When your app starts up make a backup of the database in a subfolder. A simple name of \backups within your application directory. If you want to get fancy you could even create a subdirectory under that for the YYYY-MM, or when you backup the database rename it DatabaseName-YYYY-MM.vdb4. Then you can easily show the user backups in the directory and let them pick one to restore.
To prevent the backups from growing without bounds, you can periodically sum up the sizes of them all and delete the oldest ones until you are under some threshold for size. I can almost guarantee you that having 100MB of disk space for backups is no big deal to most users these days. If that gives them a month of backups they will thank you immensely should something bad happen in the future.
2 – Ensure clean shutdown detection in single user apps
Have you ever had Microsoft Outlook crash on you while running? What happens when it restarts? Outlook will revalidate the PST file to ensure it is not corrupt.
Every app that has a database connection open should have some way to detect if it was shutdown clean. It is the only way you will know that the application was terminated (either by the user, a crash, or the OS).
Remember that VistaDB lives WITHIN your application. So when your app dies, so does the engine. A server system doesn’t have to worry about this issue because the server service is still the only thing ever talking to the file.
I typically include at startup a check for a registry key. If the key still exists at startup then the app is either still running locally, or it was not shutdown clean during the last run. I put the clear function for the registry key as the very last thing that happens before the application exits. Not when I want to exit (you could still crash trying to shutdown cleanly), but right before the actual exit.
In a C# program you can do this in the Program Main function. Always put a try / catch block around the main running of your application, then add a finally that clears the shutdown.
If you want a great tool to help catch other application errors and report them I suggest Gibraltar Software. They have made it very easy to integrate a general purpose catch error system into almost any app.
3 – Ensure clean shutdown detection in multi user apps
Multi user apps are a little harder to track because you may not be the only one in the database at the time an app fails or is terminated (or the OS crashes, etc). You should still follow the above for detecting your actual application shutting down correctly, but you now have to add some more complex logic around the packing / cleanup case.
We have deployed a fairly simple strategy by adding a System_Settings table on some projects. This settings table would include only 1 row normally, but include information about the last time a backup was made (to prevent every app from trying to backup when they startup), and whether the database had a dirty shutdown. When the dirty flag is set (from shared mode) the application then tries to open the database exclusive to pack. Depending upon your application you could be checking for this periodically within your app from other desktops and if the flag is seen release all connections for some period of time and then start trying to access the database again.
There are lots of different ways you could do this depending upon your application design. You could also put a flat file next to the database dirty.txt that every connection class checks before opening the database, etc. The important thing is to make sure that dirty shutdowns are detected and can be recovered without user intervention.
4 – Handle exceptions in your code
See the Gibraltar Software page for software to help you do this everywhere in your code. But even if you don’t use something like their software, you still need to ensure everywhere something can go wrong you handle it.
What if you run out of disk space during an insert? What if updating a row throws a concurrency exception? Ensure you handle those cases.
The easiest way to do this is to encapsulate your database functions into a Data Access Layer (DAL). It does not have to be a separate assembly outside your application. Just ensure that you have a uniform way of accessing the database and can trap all errors. Log them to ensure that errors you didn’t expect can easily be reported back to you, even if the user has to go find a flat file called “Errors.txt” in the directory and email it to you, that is better than nothing.
5 – Put all objects in using() statements
So many of the Microsoft examples should show this best practice, but don’t. I personally don’t think it adds anything to the readability of the sample, and it is a best practice. So it should be shown at all times.
Basically anytime you are allocating an object that has a disposing on it you should be putting it into a using() block. This ensures that the GC is going to clean it up as fast as possible. Calling .Close() on the object and setting it to nothing or null is NOT the same thing. You have to call .Close(), .Dispose() and then make sure all references are gone.
using (VistaDB.Provider.VistaDBDataReader reader = command.ExecuteReader())
This one thing will do more to reduce your memory problems, fix phantom errors that occur, and so much more. I have fixed so many customer problems by simply adding using() around their object allocations. VistaDB is 100% managed code, if you don’t clean us up, or hold references to us, the objects can be pinned in RAM a very long time.
6 – Use transactions only when required
Transactions are expensive, very expensive for a file based database. Don’t use them to update a single entry in a single table, or create a transaction just to run a select query. Only use them when they make sense.
Also read the Transactions can hurt performance on our blog.
7 - Encrypt only when required, do not use for password control
This one comes into play quite often from former Access programmers. Access had this concept of a password for control of opening the file by the end user in their own copy of Access. There is no need for such a concept with VistaDB. In VistaDB 4 we changed the name from password to Encryption Phrase to make it clearer that the password is being used for encryption.
Encryption takes that base phrase, adds a salt to it, and then every page is encrypted at write time, and decrypted at load time. This is quite expensive (30% or more overhead), and if your database becomes corrupt we have no way to recover encrypted data (would sort of defeat the purpose of encryption).
You can turn on encryption for the database, and then turn it off for certain tables if you only need a single table encrypted. But this still leaves all the headers on disk encrypted, and cannot be recovered.
8 – Debug with a clean copy
Don’t debug against a production database (ever!). Especially running tests like NUnit you always want to start with a known state. The easiest way to do this is to put a pre-populated database in the application directory, and set the option in Visual Studio to Copy it to the output directory always. That way at each build you have a clean database in your debugging directory.
This is probably the #1 way people corrupt their databases. Stopping and killing the engine while doing lots of activity is eventually going to cause a corruption. Try to take SQL Server and kill it every time you stop debugging, I guarantee that you will end up with a bad database after enough iterations.
9 – Ensure regular maintenance of databases
Create a regular maintenance cycle within your app. The app could do this weekly, monthly, or based upon some internal defined “usage” case. If you know each run of your app can generate 10% turn over of data, then you should pack every 10 runs to keep the database at it’s smallest size and optimal format.
Add a SYSTEM_MAINTENANCE table to your database that only has one row. LASTPACKED DateTime. Just that one field will help you determine when the database was last cleaned up. You could allow the user to do it, or prompt them, or do it in the background if you know you are at a low point for user activity.
The import thing is to have a regular plan for maintaining the database. SQL Server has these maintenance plans built in to perform index rebuilds, etc at periodic intervals. We don’t have that ability because we are only running when you are running, and we never know how long that run will last.
10 – Prepare for regular updates to components
Have a way to deploy updates to your users. You will eventually run into a bug that all your users need to avoid, or updated components from a vendor (like us). If you have no way to get that update out, you are going to have some unhappy users.
VistaDB updates periodically, and I am sure your other vendors do as well. We don’t put out updates for no reason. There are almost always fixes and updates in every build. You don’t have to rush to deploy a new major or minor, but builds within your major.minor should be considered required maintenance updates.
Sometimes customers contact us with problems from VistaDB 3.0 builds that they have never updated. Sorry, we publish regular updates, put them on the site, list them in RSS feeds, blog posts, etc. There are plenty of ways to see and get the updates. If you are running VistaDB 3.5 and are still using the initial builds you are doing yourself a disservice by not getting the more recent builds.
Any vendor product you are using that has not been updated in 12+ months is either dead (no new development), obsolete, or very trivial in nature.
Pick a date (Microsoft uses the first Tuesday of the month) to go visit all your vendors. See if they have updates, read through the release notes. Plan to integrate and test them with your next build. If any of the updates from vendors are critical, then you have a critical update to perform as well.
Make sure you have a way to get updates to users. You don’t have to write all this yourself, Tarma Installer provides a very simple way to get web updates into your applications without writing complicated setup and download scripts.
Ok, that was really as brief as I could make it (honest!). Most of these could be several posts on the topic.
Every application should have some basic maintenance built in. Users today expect that the application will handle most of these chores for them. Adding a few features like automatic backup, and clean shutdown detection will go a long way towards showing your users that you care about their long term success with your product.
I hold a PhD in computer science, and have been a practicing developer since the early 90's.
I used to be the owner for VistaDB, but sold the product to another company in August 2010.
I have recently moved to Redmond and now work for Microsoft. Any posts or articles are purely my own opinions, and not the opinions of my employer.