XOR tricks for RAID data protection
How RAID-5 protects data using simple XOR tricks.
Some time ago, I saw a very popular article on CP about XOR tricks. People started to ask why they need such a thing. Well, there are lots of useful things you can do with XOR operations and some of its properties and that’s what I’ll write about in this article.
Most of you may know RAID controllers, which can reconstruct the contents of a disk drive when one disk on the array fails. If you don’t know yet, you can refer to http://www.uni-mainz.de/~neuffer/scsi/what_is_raid.html and learn something about it. It’s a cool technology. Those who already saw a RAID controller rebuilding a defective disk knows what I’m talking about.
But do you really know how this is implemented?
In this article I’ll try to explain RAID XOR algorithms and give some simple sample code that emulates a RAID controller and show how this is possible.
I tried keeping the article as simple as possible and won’t require a math degree for understanding. There are more subtle questions in RAID-5 implementation, like the selection of disk sector groups for the XOR calculations, interlacing parity data and parity data distribution on the disk array that I’ll not write about in this article to keep this as an introduction and give you only a broader vision of what’s behind a RAID controller.
In this article, I’ll refer to a RAID controller as a generic RAID-5 device, being a hardware or software implementation.
First of all, let me show you a useful property of the XOR operation; Let A and B being e.g. two bytes. If
X = A XOR B then this is true:
X XOR B = A
A XOR X = B
This is why the XOR trick in the mentioned article works.
Using this property, you can say that this expression is true, too:
A XOR B XOR C XOR D = X
X XOR B XOR C XOR D = A
A XOR X XOR C XOR D = B
A XOR B XOR X XOR D = C
A XOR B XOR C XOR X = D
And this can be repeated for an infinite amount of terms.
What’s happening here? Put simply: If you calculate a XOR of a number sequence, you can substitute X for any number in this sequence recalculate the XOR and recover the original number.
And how this property can help in reconstructing a disk? Well, a RAID controller takes A, B, C and D as disk sectors and calculates an X that’s the XOR of this data. Then, X is written to disk too. You can think of a disk sector as a 512-bytes (4096 bit) integer number, just like an
int is a 4-bytes (32 bit) integer number. This is why RAID controllers take one disk more for “parity data”. Disk drives keep CRC data for detecting a defective disk sector. So, whenever the controller determines that a disk sector is corrupted (normally disk driver keep internally a CRC check), it substitutes the corrupted data by the “parity data”, which is really the X in the above sample. If X is damaged, you can always recalculate it from the data itself.
There is a subtle question that arises when you are implementing this kind of data protection: how will the damage occur? Look at the above data. Imagine each letter as a disk sector.
<FONT color=red>A B C D E F </FONT><FONT color=blue>G H I J K L </FONT><FONT color=#339966>M N O P Q R</FONT>
Suppose you are implementing a XOR with 1/3 of protection, like a RAID array with 4 disks. This means that you’d calculate this:
X1 = A XOR B XOR C
X2 = D XOR E XOR F
X3 = G XOR H XOR I
X4 = J XOR K XOR L
X5 = M XOR N XOR O
X6 = P XOR Q XOR R
And write X1...X6 to disk, right? This way, if you have the A and N sectors damaged you can recover them.
But one can say that errors on the disk are more sadistic: they are more probable of occurring on near disk sectors. Sometimes an entire disk dies. So, it’s more probable that sectors A and B become damaged. If you calculated X1 that way, you’d be in trouble: there’s no XOR covering A and not covering B (remember, you can always recover ONE of them, but not both). A slightly better approach would be:
X1 = A XOR G XOR M
X2 = B XOR H XOR N
X3 = C XOR I XOR O
X4 = D XOR J XOR P
X5 = E XOR K XOR Q
X6 = F XOR L XOR R
Now, we can lose an entire disk or entire sector groups and have no problem. The only problem we would have is losing A, B and N. There is lots of research and some highly sophisticated solutions for this kind of problem and I’ll not touch it here for the sake of simplicity.
Knowing this algorithm, you could create some creative data recovery solutions, with some sample uses on:
- Software protection: you could create software that has an internal “auto-fix”, giving more work to crackers, if properly hidden, of course. The algorithm is particularly boring to debug in ASM. More than this, you could create an application that comes damaged and then is fixed. If it is cracked, it will become crippled.
- Undo/Redo features : sometimes, e.g., in image applications you may have 3rd party algorithms and routines applied to your data. Proper use of this technique could reduce the amount of memory for undoing changes.
- Small business databases: I have seen applications that encrypt data for hiding direct database access in small business applications. Though this is not a good habit, this has its reasons: small businesses always have the “computers guy” that edits databases directly and often damages it, because he doesn’t know the database structure. Using an approach like this, you could give the user an option of undoing the “database improvement”.
- Media corruption protection: For backup or compression tools, or even for the shipment of your application, where you could recover damaged data pieces. I am yet to see a zip compressor that does this. Ultra Compressor 2 did this, in the old DOS times.
Attached to this article I sent a small utility in C# that emulates a rudimentary RAID5 controller. I wrote it in C# mainly because I think the code would be easier to understand.
You’ll need to provide 3 arguments:
- A file (obvious). Use a big one (~30 Mb) for easier understanding of what’s going on.
- The number of divisions of the file (in the sample X1…X6 above, it was used 3). You can use a larger number if the file is bigger. In a 30Mb file, a number like 32 is safe enough for about 50 random changes. Think this number as "how many disks (minus one) do I have in the RAID5 array?"
- The number of random changes. Since random changes are made at random positions, you can have a number of damaged blocks bigger than this number.
The program creates a copy of the input file and makes all the writes on this copy. But, be careful with the file you’ll give. And please, don’t use your pagefile for this.
Then it does the following things:
- Calculate a CRC32 of the blocks on the file. This is done to detect which blocks changed. (CRC32 C# code borrowed from another CP article). This is created with a .crc32 extension
- Calculate the parity data. This is created with a .raid extension
- Copies the file to a new one and makes some random damage on it. This is created with a .damaged extension
- Recover the file.
- Check if the file was recovered correctly.
You can compare the .damaged file at the end of the process with the original file with:
FC /B <original file> < original file.damaged>
Limitations of the sample application
Keep in mind that this is not yet a professional class application. It’s much more of a starter application for demonstrating the above algorithms. You can understand more details of the algorithm stepping through in the VS.NET debugger. For the sake of understanding of the code and the algorithms involved, this has almost no error handling and no fancy optimizations. The ideal implementation of this would be done in C++, and maybe some ASM code which leverages SIMD instructions (like the MMX instruction set). Other speed optimizations would be creating producer/multiple consumer pipes to optimize disk access patterns. This application creates a lot of disk trashing.
The error handling is straightforward to implement and is left as an exercise to the reader (don’t you hate when they tell it?). And even without optimizations, this code is quite fast on my Athlon 850 MHz, 256 Mb RAM with a 100Mb file. The Debug version takes just a few seconds to execute from inside the IDE.
The sample application does not handle well very small files, where you do not have enough blocks to compute a XOR, and may not recover them. In this case, it would be smarter simply making a copy of the file, instead of doing all this.
The standard disclaimer
As I said before, this is not the safest tool in the world. Use it at your own risk: if you use it, you can lose data, profit, have hardware problems, cause radioactive contamination and start a world war. But, for me, it works fine and never had a problem.
Well, the code in this article is free for you to use any way you want. If you improve it, drop me a note, so I can keep the code in sync. If you make money with this code, you are a genius! You deserve the money. Just remember to send me a “thank you” and give me some tips. I will not reject any money you send me too.