Click here to Skip to main content
15,879,326 members
Articles / Programming Languages / C#
Article

DIffer: a reusable C# diffing utility and class library

Rate me:
Please Sign up or sign in to vote.
4.50/5 (9 votes)
30 Apr 20054 min read 56.2K   1.4K   44   3
Flexible C# directory tree comparison utility.

Image 1

Introduction

Differ is a new file "diff" utility written entirely in C#. Its internal text file difference algorithm is borrowed from DiffEngine, another project available on CodeProject.

Differ is different for several reasons. First, the code is organized so that the actual directory tree scan-and-diff algorithm is platform-independent. It could be used in console programs, GUI apps or services because it communicates all its results via event dispatch.

The second primary difference with Differ is that it accepts an XML parameter file that specifies types of files and directory locations that should be ignored. This is helpful for developers because you can specify that build products (.exe, .dll, etc.) can be ignored during the scan. In addition, the parameter file indicates which file extensions are to be considered as "text", which avoids a direct file scan. The default XML parameter file is included, and Differ will generate one internally if necessary. These "ignore" lists can be specified as static strings or regular expressions.

Finally, as an example of the usefulness of this approach, the Differ utility can (optionally) generate a standard Windows batch file that synchronizes the contents of the two directory trees. I use the xcopy, del, rmdir and attrib commands in the batch file.

Background

After trying to use several other diffing utilities (including CygWin ports) and filtering their results through PERL scripts, I found there were many subtle errors that could occur in the process. Also, many types of files could be safely ignored. Since I remotely maintain a website, I need to keep my development tree and the web server tree in sync, and this project was my answer to that problem.

To understand how the text difference sets in a single file are discovered, please reference the original DiffEngine article. The version of the code I'm using (included in the download) has only one minor modification from the DiffEngine article as posted (see below).

Using the demo

Unzip the file DifferDemo.zip into a directory on your normal path. Then type "differ -?" for a list of options. If you type "differ -p" you'll see the contents of the default "ignore" lists and text extension mappings from the XML file differParams.xml. The format is simple enough that it should be obvious.

The file differParams.xml is located by default in the same directory where Differ.exe and its DLLs live. Since these are .NET binaries, there is no need to perform "regsvr".

Using the code

Download and build Differ as-is using Visual Studio 2003. The solution is Differ.sln in the Differ directory.

You may extend the utility by editing DifferMain.cs. It contains all the main console output, display logic and error handling.

Alternatively, you may use the DifferCore class to create your own diff utility with whatever behaviors suit your environment. Since class DifferCore (and the underlying DiffEngine support) does not access the Console object, it could be embedded into a WinForms (GUI) application, a system service or even an ASPX page (depending upon security, of course). If you wish to do this, move DifferCore into its own DLL project and reference that in your application.

In the DifferProject ZIP file you'll find a directory called CodeComments. These HTML pages were generated by the auto-documentation function of Visual Studio 2003. If you navigate your browser to the file Solution_Differ.HTM in that directory you'll be able to examine documentation for the entire project. (Note that recent changes to IE security settings may make the page render incorrect.)

There are three projects in the downloadable VS2003 .NET "solution":

  • Differ. This contains the main and core modules for the Differ utility.
  • DifferenceEngine. This contains the original file diff logic from CodeProject.
  • ZipParams. This small project contains the file/directory "ignore" collections object and its XML serialization logic.

Briefly, the class DifferCommand is a command-line utility shell object that creates a DifferCore object, parameterizing it with the desired file or directory names. DifferCommand then calls the DifferCore's Execute method.

Here's the heart of the DifferCommand object:

C#
//
//  Create the Differ object; parameterize it and attach event listeners
//
DifferCore dcore = new DifferCore( zpb, sLeft, sRight );
//  Indicate wheter we want files/dirs ignored or not
dcore.ObeyIgnored = bIgnore;
//  Hook the standard events
dcore.DifferBinaryNotify += new 
  Differ.DifferCore.DifferBinaryNotifyEvent(differNotificationBinary);
dcore.DifferDirectoryNotify += new 
  Differ.DifferCore.DifferDirectoryEvent(differNotificationDirectory);
dcore.DifferTextNotify += new 
  Differ.DifferCore.DifferTextNotifyEvent(differNotificationText);
dcore.DifferExceptionNotify += new 
  Differ.DifferCore.DifferExceptionNotifyEvent(differExceptionNotify);
//  If we're to show tracking info, attach to the event
if ( bTracking )
    dcore.DifferTrackNotify += new 
       Differ.DifferCore.DifferTrackNotifyEvent(differTrackNotify);
if ( bShowIgnore )
    dcore.DifferIgnoreNotify += new 
       Differ.DifferCore.DifferIgnoreNotifyEvent(differIgnoreNotify);
//  Perform the recursive diff search
em = dcore.Execute();

The single call to Execute returns an indication of whether the directories match or not, and this is then used to set the value returned to the Windows command shell. All other information and state changes are communicated by events generated by DifferCore. If an application doesn't need an event it should leave it "unhooked".

Points of interest

The binary 'diff' algorithm from DiffEngine was too slow for my needs so I created an alternative (trivial) match routine. You can force Differ to use the original algorithm from command line. However, the DifferCommand object will not display binary file differences even though it receives notification of them.

The only change I made to the current version of DiffEngine was to set its maximum text line length to 4096 and expose that value via an accessor.

The Differ project demonstrates the following major elements of .NET and C#:

  • XML serialization.
  • File I/O, directory and attribute handling.
  • Events and event dispatching.
  • DLL Import declarations for Windows functions.
  • Simple Regular Expression.
  • Exception handling.

History

  • Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Virgin Islands (U.S.) Virgin Islands (U.S.)
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionThrows exception in text mode if line > 4096 characters "Object reference not set to an instance of an object" Pin
LesMurphy1-Jul-11 12:24
LesMurphy1-Jul-11 12:24 
QuestionDirectory recursion? Pin
Clive Barrell3-Oct-07 5:35
Clive Barrell3-Oct-07 5:35 
QuestionThis is perfect Pin
anatase26-Mar-07 12:03
anatase26-Mar-07 12:03 
This is fast even. However how do I return the actual xml block that was changed in?
<code>
<TEST name="blah2">
<description>
this is a test
</description>
<match name="grundig"/>
<match name="rosco"/>
<stress run="value1">
<case name="case1" value"2032"/> //where I only change value="2031"
</stress>
</TEST>
</code>
differ will return only <case name="case1" value"2032"/>

but i'd like the entire TEST block. The file is 15MB so something ridculously large and changed 100 times weekly. So i'm confirming my changes versus latest on the source depot.





newb: Jack of all Trades

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.