This is a candidate entry for The Code Project's Lean and Mean competition to write an efficient text diffing tool.
The problem can be described like this: Create an application that will calculate and display the changes between two text files as fast as possible using the least amount of memory possible. Provide timing data and maximum memory use data to prove you're the leanest and meanest..
My solution is developed in C#, that compares two files. The resulting program is compact and very fast, and scales perfectly with any file size.
This is the screen when we execute this application:
This is the page when we compare two different files:
For making this application fast and simple, we first retrieve the location of two files. If the location and name of both these files are the same, then we assume that the content of these files is same.
In the second step, if the names of these two files are different we first retrieve the length of content of these two files. If there is a difference in their lengths, then the function returns false and we calculate time and give the message that the files are not the same. If the lengths of both these files are the same, then we start reading both files, and comparing the lines consecutively, as long as the lines in both files are identical, we continue without any problem. However as soon as there is a mismatch, we have to decide whether the current line in file1 has disappeared, or the current line in file2 has been inserted, or the one line has been edited to become the other. We show the result in a message box.
Using the Code
I basically have created a function
private bool FileCompare(string file1, string file2)
if (file1 == file2)
fs1 = new FileStream(file1, FileMode.Open);
fs2 = new FileStream(file2, FileMode.Open);
if (fs1.Length != fs2.Length)
file1byte = fs1.ReadByte();
file2byte = fs2.ReadByte();
while ((file1byte == file2byte) && (file1byte != -1));
return ((file1byte - file2byte) == 0);
This function takes as an input two files which have to be compared further and gives us the output in the form of a boolean value that is
The sample code that is described in this article performs a byte-by-byte comparison until it finds a mismatch or it reaches the end of the file.
The code also performs two simple checks to increase the efficiency of the comparison:
- If both file references point to the same file, the two files must be equal.
- If the size of the two files is not the same, the two files are not the same.
Points of Interest
This functionality is similar to the MS-DOS-based Fc.exe utility that is included with various versions of Microsoft Windows and Microsoft MS-DOS, and with some development tools.
This application takes less than 10 milliseconds to compare two files and it takes 40k memory for the processing of the full application.
- 28th August, 2009: Initial post