Click here to Skip to main content
15,879,096 members
Articles / Programming Languages / Visual Basic
Article

LCS based generic diff in VB.NET

Rate me:
Please Sign up or sign in to vote.
4.69/5 (5 votes)
30 Sep 2008CPOL2 min read 39.9K   206   12   1
How to create a diff script for a list of objects in VB.NET.

Introduction

This article aims to illustrate how to programmatically determine the differences between two arbitrary lists of objects, then interpret the list of differences to create a "diffscript" - a script containing a list of operations to be performed on the original list to end up with the destination list.

Background

In my 22 years of developing, I have come across many situations where I recognized a source and destination being updated. I have been fantasizing about how a differential update would be far more efficient than overwriting the entire destination.

Developing software is a way for me to completely get to understand (grok) the intricacies of a certain situation or problem. Writing the attached source code allowed me to understand the diff algorithm and how and why it works. In my investigation, I used this excellent website CodeProject and articles from various authors, as well as information readily available from Wikipedia.

Using the code

The code attached contains a set of Generic classes that work together.

The way to invoke the diff-script engine, which creates a diff-script, is as follows:

VB
Dim a as string() = {"a", "b", "c"}
Dim b as string() = {"b", "c", "a"}
Dim script As DiffScript(Of String)
script = DiffEngine(Of String).MakeDiffScript(a, b)

This creates a diff-script containing one modification entry, a move operation (of the "a" item) to the position after c (3).

How it works

The base algorithm is the Longest Consecutive Sequence (LCS) algorithm as described in other articles on this site and Wikipedia.

I have included an index mechanism that will convert the list of arbitrary items to a list of integers to compare. The index allows the integers to then be re-mapped to the original values, if needed.

The index mechanism also compares the start of both lists and the end, to see if they are equal. The LCS is a fairly intensive algorithm, and will only be performed on the "block" of items that is different between the start and the end that are the same.

Points of interest

There is some test code in the DiffTestCode and IndexTestCode classes, that you will not need in your implementation.

The attached code uses a modified version of the LCS algorithm. The only modification made is in the backtracking method. Where the original algorithm always favors a remove or add operation (resulting in ---+++), I modified it so that it will favor a + operation on the same location before another - operation. This modification makes it a lot simpler to identify replacement operations.

The source code is heavily commented, please review the comments in the source for more information.

History

This is version 1 of the article. I do not expect the code to change a whole lot, but you never know. I do intent to expand this article when I have some time; for now, most comments reside in the source itself.

Feel free to implement the code in your own project, and send me some feedback :P

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Netherlands Netherlands
PuReBRaiNZ has been programming in all microsoft supported languages for nearly ten years. His current job is in ms (asp.net & sql) consultancy in the Netherlands.

Comments and Discussions

 
General非常棒~~ Pin
caoqinghuajlhcn1-Mar-11 14:50
caoqinghuajlhcn1-Mar-11 14:50 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.