Click here to Skip to main content
13,299,298 members (44,593 online)
Click here to Skip to main content
Add your own
alternative version


20 bookmarked
Posted 27 Aug 2009

NLineDiff: A diff Implementation for Text Files

, 28 Aug 2009
Rate this:
Please Sign up or sign in to vote.
An entry for the Lean and Mean competition





Key: Red lines are those deleted from the first file and green lines are those inserted from the second file.


This article is an entry for the Lean and Mean competition, which ran on The Code Project during August 2009.

I have implemented a slight variation of Eugene W. Myers' diff algorithm with the linear space refinement from the paper listed in the downloads.

The algorithm is O(ND) in time and O(N) in space, where:

N = number of lines in FileA + number of lines in FileB
D = number of different lines

The output is rendered using a simple WinForms WebBrowser control.


For the test files, N = 3,573 and D = 19. On my 2.5 GHz machine, the solution takes about 5 ms and uses about 450 KB of memory.


A large proportion of the time is spent reading the files from disk and splitting them into string arrays. Using File.ReadAllText takes about 1.7 ms for both files, while File.ReadAllLines takes about 1.9 ms. Using File.OpenRead and StreamReader.ReadLine executes in almost identical times.

Building the HTML from the results also takes some time. Just showing the changes takes about 0.15 ms, which rises to 2.6 ms if the common lines are also included. I have chosen to include the common lines to give more useful results.

As for the diff algorithm itself, the average of 100 iterations is about 0.33 ms. The time for a single run obviously varies more than the average, but consistently takes about 0.4 ms.

These operations give the total time of 1.9 + 0.4 + 2.6 ≈ 5 ms.

The diff is run once before the timing run to allow the JIT compiler to work. This takes about 27 ms, which dwarfs the calculation time. I think that not including this time is legitimate as Chris said in the Notes:

"... there's no need to report memory usage or speed of the framework you use to support your code."

If you disagree, you can add in the JIT time to give a total time of about 32 ms.


The space used is the sum of three areas:

  • The files are loaded into memory, which takes about 360 KB.
  • They are split into string arrays, which takes about 30 KB.
  • The algorithm itself uses an int array, which is about 60 KB.

The sum gives a maximum working memory of 450 KB.

I see that other entries have used Process.PeakWorkingSet64, but this is mostly a measure of how much memory the framework is using. Running this code reports about 900 KB being used by the Thread.Sleep call, which is obviously not correct.

static void Main( string[] args )
  var p = Process.GetCurrentProcess();
  var peak = p.PeakWorkingSet64;

  Thread.Sleep( 1000 );


  Console.WriteLine( String.Format( "peak: {0:N0} bytes", p.PeakWorkingSet64 - peak ) );

Using the Code

The binary is a single executable that has the test files included as embedded resources. It is configured to load the two texts and split them into string arrays, then time the diff algorithm.

You can also supply two file paths on the command line to use those files.

The source contains some variations which are commented out in Form1.cs. You can edit this file and recompile to verify my time and space measurements on your machine.

Points of Interest

Myers' algorithm is a work of art and I have thoroughly enjoyed learning it. It has taken me about a week to implement it because that's how long it took for me to fully understand it. It is definitely worthy of an article on its own (coming soon!)


  • 2009 08 27: First edition
  • 2009 08 28: Fixed bug with overlapping diagonal when D=1 && Δ is even


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Nicholas Butler
United Kingdom United Kingdom

I built my first computer, a Sinclair ZX80, on my 11th birthday in 1980.
In 1992, I completed my Computer Science degree and built my first PC.
I discovered C# and .NET 1.0 Beta 1 in late 2000 and loved them immediately.
I have been writing concurrent software professionally, using multi-processor machines, since 1995.

In real life, I have spent 3 years travelling abroad,
I have held a UK Private Pilots Licence for 20 years,
and I am a PADI Divemaster.

I now live near idyllic Bournemouth in England.

If you would like help with multithreading, please contact me via my website:

I can work 'virtually' anywhere!

You may also be interested in...


Comments and Discussions

NewsVersion 1.1 posted Pin
Nick Butler27-Aug-09 10:38
mentorNick Butler27-Aug-09 10:38 
GeneralRe: Version 1.1 posted [modified] Pin
Ilka Guigova28-Aug-09 11:15
memberIlka Guigova28-Aug-09 11:15 
Hello Nick,

I confirm that now your application runs with various inputs.

I created grid3.html (to contain the contents of grid1.html 5 times) and grid4.html (to contain the contents of grid2.html 5 times) and did some more testing:
- NLineDiffDemo grid1.html grid1.html - it takes about 0.37 ms and 57 096 bytes
- NLineDiffDemo grid1.html grid2.html - it takes about 0.67 ms and 57 176 bytes
- NLineDiffDemo grid3.html grid3.html - it takes about 3 ms and 285 448 bytes
- NLineDiffDemo grid3.html grid4.html - it takes about 6 ms and 285 848 bytes
(If I got the units wrong, please ignore them and just keep the scale in mind. I apologize.)
And it confirms that the algorithm runs in O(N) in space and O(ND) in time.

Have you considered other approaches apart from the Myers' LCS algorithm? I am looking forward to your article on it.

I also like your output presentation very much : )

I will refrain from voting due to conflict of interest but I welcome any comments that you may have on my entry: Code Lean and Mean File DIFF (FIFF) Application[^] Smile | :)

Good Luck!

I would imagine if you could understand Morse Code, a tap dancer would drive you crazy.
-- Mitch Hedberg (American Comedian, 1968-2005)

modified on Friday, August 28, 2009 5:51 PM

RantGood start, but severe issues... Pin
Peter Souza27-Aug-09 6:25
memberPeter Souza27-Aug-09 6:25 
AnswerRe: Good start, but severe issues... Pin
Nick Butler28-Aug-09 5:11
mentorNick Butler28-Aug-09 5:11 
GeneralMy vote of 2 Pin
Peter Souza27-Aug-09 6:24
memberPeter Souza27-Aug-09 6:24 
GeneralReducing the memory usage with memory mapped files Pin
Nemanja Trifunovic27-Aug-09 6:16
memberNemanja Trifunovic27-Aug-09 6:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.171207.1 | Last Updated 28 Aug 2009
Article Copyright 2009 by Nicholas Butler
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid