Click here to Skip to main content
Click here to Skip to main content

Transactional File Management in .NET

, 26 Aug 2006 CPOL
Rate this:
Please Sign up or sign in to vote.
Transactional files, missing in Windows, allow file manipulation that can be rolled-back to the original state, and retain their integrity in case of failure. It can be used for any manipulation, including accelerated alternatives to database manipulation (to be discussed in a follow-up article).

Abstract

Transactional files, missing in Windows, allow file manipulation that can be rolled-back to the original state, and retain their integrity in case of failure. It can be used for any manipulation, including accelerated alternatives to database manipulation (to be discussed in a follow-up article).

Background

There are quite a few issues which are seldom discussed though such a functionality could be quite useful. This article will deal with transactional file management, a mechanism where making changes to a disk file could be cancelled with the file reverting to its pre-transaction state. Such a facility was promised as far back as Windows 2000, and it is now scheduled for Windows Vista.

The topic was also more commonly discussed when COM+ was new and exciting, and it became obvious that while email (Exchange), message queue (MSMQ) and databases (SQL Server, Oracle) were “transactional-aware”, simple disk files were not, and could not therefore participate in transactions unless a “compensating resource manager” was implemented for it. This article shows such an implementation, although it does not implement the COM+ transactional interfaces necessary and cannot be part of its current state in multi-resource-managers transaction.

What is it good for? Well, this should be obvious: there are many cases where we modify data files only to fault on exception or otherwise reach a state in the application logic where we wish to abort, which require returning the data file to its previous state. A simple way to achieve this is through file duplication, writing to a new copy and replacing the original file once the entire transaction is complete; alternatively, we can keep a copy of the original file, and put it back in case of failure. Of course, this is not elegant and the second option will not survive a system restart failure; additionally, what do we do when the file is large, and wasting the duplication creation time and space is not an effective option (think 100MB file or more)?

Functionality

In terms of functionally, the proposed implementation is rather basic: open a transactional file, read and write individual bytes at any position, and either commit or rollback the changes at any time; the file should recreate its original content if it detects when opened that it should recover from a previous crash. A test script will produce the following output:

Original content: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Content while in-transaction: ?????????????????????VWXYZ
Content after commitment: ?????????????????????VWXYZ
Content after disposing transactional file: ABC1EFG23JKL45OPQR867VWXYZ
Original content: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Content after rollback: ?????????????????????VWXYZ
Content after disposing transactional file: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Original content: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Content after failed commitment: ?????????????????????VWXYZ
Corrupted content after disposing: ABC1EFG23JKLMNOPQRSTUVWXYZ
Recovered content: ABCDEFGHIJKLMNOPQRSTUVWXYZ

Note that there seems to be a bug in .NET (I don't believe it could be a Win32 error), or more likely – a bug in my code; if you can find it, please email me and I'll fix it. The problem is that once you start locking file ranges (more on that in a second) the operating system locks more than what you asked for; the same happens when unlocking file ranges (see patch in the implementation to deal with it).

Implementation

Essentially, what we want is to keep all modified data in memory; if a failure occurs and the application rolls-back, the changed data is disposed from memory and the file remains intact in the pre-transaction state; if the system or application crashes completely, the file was never changed anyhow.

Of course, changed data should be available for additional access by the program, so all interaction with the file must be done through the transactional file object: it routes read/write operations to changed in-memory data if available, retrieving more data from the file when necessary. For example:

TransactionalFile trans = new TransactionalFile(path);
trans.Write(3, (byte)'1');

Transactions follow the ACID principles: Atomicy means all actions occur together or none at all, supported here by making all changes transient to memory with one commitment grouping all updates; Consistency means no partial data is ever found by making changes in an atomic manner, and recovering from partial commitments (see below); Isolation means other processes are not aware of the changes made and cannot interfere with it until the transaction commits or is rolled-back, implemented here by locking any file portion with pending changes (although the change is NOT written to file at this time); Durability means changes committed survive failures, and although this implementation covers this in a sense, the full implementation should allow for multi-party two-phase commitment, which is not currently available.

Once the required changes are completed, the transactional file is committed by writing all pending changes from memory to file and releasing the locks on the file, resulting in a new state for the file.

trans.Commit();

Failures may occur at various points: if before commitment, the in-memory changes simply disappear and the file retains its pre-transaction integrity; if after commitment, the file has a new integral state; if while the commitment is in process, a compensating process returns the original state of the file.

To detect failure that happens while commitment was in process, and be able to recover by recreating the pre-transaction state, a log file of the original state for every change made is created throughout the commitment process, then discarded once completed. If commitment was stopped midway, the log file still exists, which means the next time the transactional object will access the file and detect the presence of the log file, it will first re-apply the original content stored in the log file, discard it, then start with the file integrity restored.

Code Structure

TransactionalFile is the main class: provide a path to the file in the constructor, use ReadByte and Write to access bytes at a specific position, Length to alter file size, Commit to write the changes and Rollback to cancel it.

ChangesLogFile is a support class for writing portions of the original content, to support recovery of failed commitments.

ByteRange is a utility class managing a portion of the file, including start position, length, original content and pending changes.

Future and Potential Improvements

There are numerous aspects that can enhance and improve this component, including:

  • The interface can support textual and binary read/write operations for byte arrays and typed-data, like BinaryReader and related classes do
  • .NET 2 provides the new System.Transactions namespace, intended for just this although actual implementation is not available through the .NET run-time, which only hooks into Windows Vista; the component presented here could be made to be compatible with this namespace
  • Full two-phase commit compliant with Microsoft’s COM+ resource managers infrastructure will allow the component to participate in multi-party and distributed transactions
  • Thread safety and support for concurrent threads activity
  • Compatibility with .NET’s IO stream model
  • Non-transactional mode of operation in addition to the transactional mode

Follow-Up

I have actually created this implementation in the last few hours to support an accelerated data manipulation alternative to databases for some types of data; see a follow-up article on the subject.

About the Author

Israel Kehat is a technology consultant turning entrepreneur (looking for several startups to launch by the end of 2006). He earns his keep as strategy consultant for Tourico Travel Holdings, active in the USA, Israel and Romania. You can contact Israel at israel@iks.co.il, although he apologizes for not being available for peer assistance in most cases.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

ikehat
Web Developer
Israel Israel
Israel Kehat is a technology consultant turning entrepreneur (look for several startups to launch by the end of 2006). He earns his keep as strategy consultant for Tourico Travel Holdings, active in the USA, Israel and Romania. He's also Chief Strategist for Affect (Technology Recruitment), CTO for Actus-Imago (media analysis and intelligence), and CEO for dSpot (Web 2.0 information management).

Comments and Discussions

 
Generalbroken dl links PinmemberIan MacLean26-Aug-06 3:32 
GeneralRe: broken dl links Pinmemberikehat26-Aug-06 8:34 
GeneralRe: broken dl links PinmemberRoland Pibinger26-Aug-06 9:19 
GeneralRe: broken dl links Pinmemberikehat26-Aug-06 20:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141220.1 | Last Updated 26 Aug 2006
Article Copyright 2006 by ikehat
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid