11,705,764 members (51,904 online)

# NLineDiffConsole: A Console diff Implementation for Text Files

, 31 Aug 2009 CPOL 15.6K 415 13
 Rate this:
An entry for the Lean and Mean competition

## Introduction

This article is an entry for the Lean and Mean competition, which ran on The Code Project during August 2009.

This is a console version of my previous entry. As such, it also uses the diff algorithm with the linear space refinement. I have included the original paper in the downloads. In this version, I have halved the memory requirements, but the algorithm is the same.

Correction: According to Wikipedia, the algorithm was written by Douglas McIlroy in the early 70's while Eugene W. Myers only described it in his paper. Apologies.

## Background

If N = number of lines in the first file, M = number of lines in the other file and D = number of different lines, then the diff algorithm is O( (N+M) D ) in time and O( N+M ) in space.

This is far better than the LCS algorithm, which is O( N * M ) in both time and space. For the test files N = 1784, M = 1789 and D = 19. So in this case, the complexity of the LCS algorithm is several orders of magnitude fatter and slower than the diff algorithm.

I have to repeat my appreciation of McIlroy's algorithm. The more I have learned about it, the more impressed I have become. I certainly couldn't invent anything better!

## Using the Code

The demo is a console application that uses the diff implementation, which is in a separate assembly. The demo takes two file paths as arguments. It reads the files, calculates the diff and displays the results and some timing measurements. There is a `#define` in program.cs that optionally measures the change in `Process.PeakWorkingSet64`.

## Results

The screenshot above shows the timing results for a typical run. Not including the JIT compilation gives times of about 5.5 ms. Including it gives overall times of about 32 ms.

The main arrays used by the algorithm are now a bit under 30 KB, so the calculated memory use is 420 KB plus change.

Other entries measure the change in `Process.PeakWorkingSet64`, so I will do the same. The measurements vary dramatically, but values of 600 - 700 KB are common.

## History

• 2009 08 31: First edition

## About the Author

 United Kingdom

I built my first computer, a Sinclair ZX80, on my 11th birthday in 1980.
In 1992, I completed my Computer Science degree and built my first PC.
I discovered C# and .NET 1.0 Beta 1 in late 2000 and loved them immediately.
I have been writing concurrent software professionally, using multi-processor machines, since 1995.

In real life, I have spent 3 years travelling abroad,
I have held a UK Private Pilots Licence for 20 years,
and I am a PADI Divemaster.

I now live near idyllic Bournemouth in England.

I can work 'virtually' anywhere!

## Comments and Discussions

 First Prev Next
 yours faster sajamp30-Oct-14 5:48 sajamp 30-Oct-14 5:48
 Great. Luc Pattyn31-Aug-09 16:02 Luc Pattyn 31-Aug-09 16:02
Hi Nick,

congratulations on your TextDiff series of articles. I couldn't read them as fast as you wrote them, and I haven't found the time yet to study Myers' work, of course I was still busy with my one app and article. So here is my first and only reaction to all three of them.

I particularly liked the graphical output you have in two of your text diffing apps; and adding in-line comparisons in the last one was a nice touch. That is from a user's point of view; technically, the performance is bound to take a hit here.

I must admit I was not fond of some of your measurements.
In NLineDiff you excluded reading the files from both timing and memory considerations. And rather than measuring the memory usage, you calculated the storage as 2 ints per text line, and ignored everything else: the strings themselves, all other objects, the stack usage, the garbage that gets produced and not immediately collected, etc.

In NLineDiffConsole you have measured several partial times which is very informative; you also started using PeakWorkingSet64, and reported variations in the range 600-700 KB. I hardly see any variations in mine (it varies by no more than 4 KB), that is probably due to the fact that I do a couple of GC.Collect() before getting the initial value.

In NDiffDiff you returned to the graphical output, very nice; unfortunately that turned out to be a step back both in speed and working set. For all three I don't understand why your apps don't measure and display a realistic memory usage. It was part of the assignment after all.

FYI: You are probably aware that I have chosen a different path; my LPTextFileDiff doesn't read all text at once (although that probably is the fastest way to take it in), I preferred a more scalable approach. For "normal" sizes, I think our times aren't far apart, however I also did test with a huge example (90MB, containing 1000 times the original): my app took some 2.3 seconds and 440 KB; whereas your NLineDiff, to which I added my metrics code, took some 27 seconds and a whopping 460 MB.

The deadline is approaching really fast now. All that remains to be said now is: May Chris have a hard job figuring it all out (after all we have been developing code he wanted to get for free), and may the best one win.

Cheers.

Luc Pattyn
 Have a look at my entry for the lean-and-mean competition; please provide comments, feedback, discussion, and don’t forget to vote for it! Thank you.

 Re: Great. Nick Butler31-Aug-09 23:47 Nick Butler 31-Aug-09 23:47
 Re: Great. Luc Pattyn31-Aug-09 23:57 Luc Pattyn 31-Aug-09 23:57
 Re: Great. Chris Maunder4-Sep-09 9:10 Chris Maunder 4-Sep-09 9:10
 Last Visit: 31-Dec-99 18:00     Last Update: 29-Aug-15 14:55 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.