![]() |
General Programming »
Algorithms & Recipes »
General
Intermediate
Introducing NetsBy pseudonym67An introduction to the Nets data structure |
C# 2.0, Windows, .NET 2.0VS2005, Dev
|
||||||||
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
Think of a house any house and if you do it will probably be a bit blank, a bit unfilled in. You may think of a shape be it one story, two story, or more but the details wont have been fleshed out, you may have pictured the windows but not if they had curtains or the colour or pattern of the curtains. This in simple terms is called modelling, we do it all the time we can imagine something in basic shapes and forms and we know what it means, sure we can fill in the details later, but most of the time we like to the think that the details are already there. That we know what they are but we just didn't bother to expand on them because we know what a house is.
When we come to computers though we see something different our whole perception of computers and the way they work is not based on models of the same variety. When it comes to computers everything has to be fleshed out to the nth degree. They need to be exact, to know everything about everything otherwise they wont work and the slightest not knowing of something is a bug, a fault in the program that needs correcting.
Sure we have arrays which are models of a sort and hashtables and dictionaries which if we are lucky may allow us to link up two related pieces of data together. Each piece of data is still unique. Even when we look at relational databases the data is still separated into chunks with each chunk say a company being linked to the other data on companies by what it is the company does. While the data itself is possibly very complex the data type i.e. a company is still a relatively simple data type in that it is a single block of data that represents a single item.
Of course for demonstration purposes I over generalise and simplify but the reason for this is to show the thinking behind the creation of Nets which attempt to deal with complex datatypes. Or to be more precise strings of data types and of course the perfect example for dealing with strings of data types is to deal with strings themselves which is why the example provided shows how Nets deal with words and then sentences and then paragraphs. Of course as with any data structure it is up to the individual whether they have any use for it and I wouldn't have thought of it unless I had a specific use in mind.
A Net gets it's name from the diamond pattern style of fisherman's net although this is not entirely accurate as we don't complete the diamond shape, a basic Net node looks like this,
It largely is an expansion on the tree data structure with a pointer to the forward node and nodes to the left and right that are based on object comparisons with the node forward of the current node to the left or up in the picture being less that the data in the forward node itself and the data in the right nodes, being greater than the data within the forward node. Of course things are made more complicated by the fact that the left and right nodes are actually lists of nodes. So without getting into code just yet I'll try to explain the way it works.
If you have three strings "ABC", "ABB" and ABF" and you add them to the net you will get a net that looks like,

If you add them in the order listed above. If you were to add the "ABF" string first then the "C" and the "B" would both be to the left or up on the picture. It should be added here that if you were to add the string "AB" at this point it would disappear as the simple net does not allow duplicate strings, the "AB" string would simply merge in with the other strings and be forgotten.
However if you add two more strings "ABBE" and "ABAF" the net will look like this,
At this point the original string "ABB" has been completely overwritten and will no longer be returned when you print the contents of the net.
As can be seen from the lists to the left and right ( above and below in the image ) are sorted automatically as the net is built. This is now an integral part of how the nets work ( see the code discussion ) but it is clear that the nets would work faster without this, although the primary goal in the development of this project has been to get the nets data structure working correctly rather than getting it to work fast.
The best way to show exactly how the net works is to run through the demonstration project and how to use it with the aim to show not only how it works but to give an understanding of what can be achieved through using the nets data structure.
The initial and main demonstration deals with text files this is pretty rough and ready code and the output depends pretty much entirely on the editing of the file. I mention this because included in the project is a text copy of Charles Darwin's Origin Of The Species which is used throughout the main part of the demonstration and which does contain a few editing errors.
The tab for the book demo looks like,
There are three types to this demo the words, sentences and the paragraphs demo, as the names suggest the idea is that you select a book and the net is built up from either the words, the sentences or the paragraphs contained in the book. It should be noted that the more complex the net then the longer it will take the computer to build it so the words demo will build relatively fast, while the sentences demo will take a while and if you are going to run the paragraphs demo, then bring a book or don't surf the news pages until after you've started it.
As it is the quickest the words demo is always set as the default on start up. The bottom half of the tab deals with the search options that we shall look at shortly firstly start up the demo by selecting the file button at the top,
and selecting a text file. The program will then read the file and build the net. The output will look like,
You'll have to scroll down for the more meaningful items but as you can see,
6512 items to choose from.
There are two types of search that can be run on the net and these are a search on data strings within the net that start with a given string which is on the left and a search for a data string that contains the given items.
The search starting with is on the left of the tab. if we enter the letter o in the text box and press the left most search button,
we get all the individual words in the net starting with the letter o. There are 161. If we move to the right side of the tab and enter oo into the text box there we can run the search for all the words containing the letters o and o,
Here we get the 482 words in the book that contain the values o and o. Note here that the default behaviour is to return any items that contain the values o and o and if you want to run a search for all the items that contain the values o and o together you need to specify this by clicking the sequential checkbox. There are 81.
The sentences and the paragraphs demos work in exactly the same way they are just dealling with much more complicated data.
The data demo is much simpler in construction in that it is simply provided to show the nets working using a custom or more advanced data type,
The insert data button builds a net of random data using a class that has an integer and a string as members these are then created in random length strings of data and inserted into the net. You can click on the Insert Data button as many times as you feel like. The search options allow for the entering of a single integer value and text to go with it. The valid range is from 0 - 9 for the integer and one of "up", "down", "left", "right", "forward", "backward" for the text. Note that they are all lower case and that they don;t have any intrinsic meaning within the program they are simply strings.
If you click the Insert Data button a few times you will get something that looks like this,
And a search through the generated data for the value 5 and the text "forward" will give something like:
Search For String Containing
5 forward
Print out for item 0
Value = 3 String = forward Value = 7 String = left Value = 4 String = right
Value = 3 String = down Value = 8 String = down Value = 0 String = down
Value = 5 String = forward Value = 5 String = up Value = 6 String =
down Value = 3 String = backward Value = 1 String = down Value = 3
String = right Value = 4 String = down Value = 6 String = right Value = 1
String = up Value = 5 String = forward Value = 3 String = forward Value = 0
String = down Value = 8 String = left
Print out for item 1
Value = 6 String = backward Value = 4 String = down Value = 5
String = forward
Print out for item 2
Value = 6 String = up Value = 8 String = forward Value = 3
String = right Value = 0 String = forward Value = 5 String = right Value = 5
String = forward Value = 3 String = forward
Print out for item 3
Value = 7 String = backward Value = 6 String = backward Value = 3
String = forward Value = 4 String = right Value = 8
String = forward Value = 8
String = right Value = 4 String = backward Value = 3 String = down Value = 5
String = forward Value = 4 String = left Value = 5 String = left Value = 5
String = backward Value = 6 String = left Value = 0 String = backward
Value = 2 String = forward Value = 0 String = backward
Print out for item 4
Value = 8 String = right Value = 6 String = down Value = 5
String = left Value = 2 String = right Value = 9 String = forward Value = 3
String = down Value = 6 String = forward Value = 8 String = up Value = 6
String = down Value = 4 String = right Value = 3 String = down Value = 5
String = forward
Note that the search mechanism here is for all the strings containing the value entered, so in this example we get no values that start with the entered data.
The SimpleNet class is defined as,
public class SimpleNet< T > where T : IComparer< T >
which is a simple generic class definition that uses a class that implements the IComparer interface. This class can be made up of any data that is required by the user with the only provision being that it must implement the IComparer Compare method. This is the sole requirement to using the SimpleNet class. So the class we use for the book example is defined as,
public class StringData : IComparer< StringData >
{
private string strData;
public string Data
{
get
{
return strData;
}
set
{
strData = value;
}
}
public StringData( string data )
{
strData = data;
}
public StringData()
{
strData = null;
}
public int Compare( StringData dataOne, StringData dataTwo )
{
return String.Compare( dataOne.Data.ToLower(),
dataTwo.Data.ToLower() );
}
public override string ToString()
{
return strData;
}
}
As you can see there is nothing special about this class at all except the Compare function which calls the string Compare function which probably calls the IComparable CompareTo Function. This then is the technical complexity needed to use classes in the SimpleNet. A Compare function that calculates if the data in the objects is equal returning a zero if they are or a -1 if the first object passed is smaller than the second object and a 1 if the first object is greater than the second object.
As we are dealing with complex data strings or collections the data must be entered into the net as lists of data and this is done in the mainform.cs file when we create the strings of data for the data demo with the code,
List< DemoData > newList = new List< DemoData >();
if( demoNet == null )
demoNet = new SimpleNet< DemoData >();
int nNumberOfItems = rand.Random( 20 );
int nPiecesOfData = 0;
List< string > stringCol = new List< string >();
stringCol.Add( "up" );
stringCol.Add( "down" );
stringCol.Add( "left" );
stringCol.Add( "right" );
stringCol.Add( "forward" );
stringCol.Add( "backward" );
for( int i=0; i<nNumberOfItems; i++ )
{
newList.Clear();
nPiecesOfData = rand.Random( 20 );
for( int j=0; j<nPiecesOfData; j++ )
{
newList.Add( new DemoData( rand.Random( 10 ),
stringCol[ rand.Random( stringCol.Count ) ] ) );
}
demoNet.InsertList( newList );
}
I use this example as it is very simple in that all we do is create a list define a few variables and then build it before adding to the Net with the InsertList function. It also possible to use the inbuilt facilities of the class which contains a list that can hold the data before inserting it into the Net. This is done through the AddToList( T data ) function and then calling the InsertList() function though you must remember if you use this technique to call ClearList() before starting a new list.
To perform a search you use the code,
List< StringData > list = new List< StringData >();
for( int i=0; i<strWorking.Length; i++ )
{
list.Add( new StringData( strWorking[ i ].ToString() ) );
}
List< List< StringData > > resultsList =
wordNet.FindListsStartingWith( list );
which builds a list to search for and then calls the search function, in this example I am using the code to search for a word at the start of a string from the wordNet example in mainform.cs.
The SimpleNet class is 1400 lines long so I will give more of an overview of it's basic operations here and let those who are interested go through the code. The class itself consists of 14 public functions of these, 4 functions are for inserting data, these being the already mentioned AddToList, ClearList and the overloaded InsertList. Two functions return lists of lists of the data in the Net these being the NetPrintOut function which returns the data as it gets it and the GetSortedList which returns the same data but sorts it first. There are then 2 IsListPresent functions which returns a boolean value for the given list. There are 2 overloads for the FindListsStartingWith and 5 overloads for FindListsContaining.
To start with all data in the SimpleNet is stored and manipulated as part of the SimpleNetNode class. This is your typical data type node that contains the lists and the points to the node in front and the node behind or the parent node. It is defined as,
public class SimpleNetNode< T > where T : IComparer< T >
and contains the data,
private SimpleNetNode< T > parentNode;
private SimpleNetNode< T > forwardNode;
private List< SimpleNetNode< T > > leftList;
private List< SimpleNetNode< T > > rightList;
private T data;
private bool bIsBase;
private bool bIsRightBase;
private bool bIsLeftBase;
as you can see the first four variables point to either other nodes or lists of other nodes. The data variable is the class that is added to the node and the Base variables are used to as quick checks to see if the lists are used.
There is no reason as user of the Net class to ever create a Net node object. In fact this was originally private but as we are using generics the compiler was never quite sure if there was an error or not. So it was simpler and cleaner to have it as a separate class.
In the SimpleNet class there are two conversion functions,
private List< SimpleNetNode< T > > ConvertListToNodes( List< T > dataList )
which converts the input lists to a list of SimpleNetNodes and the functions,
private List< T > ConvertList( List< SimpleNetNode< T > > list )
and
private List< List< T > > ConvertList(
List< List< SimpleNetNode< T > > > list )
which convert the lists back before they are returned to the calling code.
As you have seen the lists of data are inserted using the InsertList function which is responsible for building the Net. This function starts by taking the list as a parameter,
private void InsertList( List< SimpleNetNode< T > > insertionList )
and then cycles through the list dealing with one item at a time,
for( int i=0; i<insertionList.Count; i++ )
{
SimpleNetNode< T > nnNewData =
new SimpleNetNode< T >( insertionList[ i ].Data );
}
If the list is empty the data is made to be the root,
if( TrueRoot.ForwardNode == null )
{
TrueRoot.ForwardNode = nnNewData;
CurrentNode = TrueRoot.ForwardNode;
continue;
}
If there is already data in the list we check the next item which is the forward of the current node. and if this is null we add the data at this point. If at this point the CurrentNode forward is not null we have to do a comparison and decide where to place the item, so the main block of the InsertList code looks like,
nnTempNode = CurrentNode;
int nResult = nnNewData.Data.Compare( nnNewData.Data,
CurrentNode.ForwardNode.Data );
if( nResult == 0 )
{
CurrentNode = CurrentNode.ForwardNode;
}
else if( nResult == -1 )
{
}
else
{
}
This is where data is overwritten because if the data we are adding is the same as the data in the list we do nothing with it but simply move forward and go on to the next item in the list.
If the data is Not equal whether it be larger or greater then we must continue checking it so for demonstration purposes we'll assume that the Compare function returned -1.
bool bFound = false;
for( int n=0; n<CurrentNode.LeftListCount(); n++ )
{
nnTempNode = CurrentNode.GetLeftNodeAt( n );
if( nnTempNode.Data.Compare( nnTempNode.Data, nnNewData.Data ) == 0 )
{
CurrentNode = nnTempNode;
bFound = true;
break;
}
}
if( bFound == false )
{
SimpleNetNode< T > insertionNode =
new SimpleNetNode< T >( nnNewData.Data );
insertionNode.ParentNode = CurrentNode;
CurrentNode.AddLeftNode( insertionNode );
nnSortNode = CurrentNode;
CurrentNode.IsBase = true;
CurrentNode.IsLeftBase = true;
CurrentNode = insertionNode;
bSortLeftList = true;
}
We cycle through all the elements in the chosen list and if it is present set the current pointer to the node in the list and continue from there when we move to the next item in the insert list. If the node is not present in the list we add it to the list.
The main searching functions in the SimpleNet class are,
public List< List< T > > FindListsStartingWith( List< T > dataList )
and
public List< List< T > > FindListsStartingWith()
These two functions as with there companion FindListsContaining functions search through the SimpleNet and return the results as a list of lists for the data. This is done by converting the Net into a List of Lists internally and then searching through the lists. There are other ways to do this but with a data structure as complicated as a Net it is easy to tie yourself into knots when trying to move through searching for specific items so it was felt that this is a solution viable for a simple iteration of the Nets data structure.
This article is an introduction to the ideas and the mechanisms of the Net data structure and as such with the exception of the insertion code the techniques especially for the search are of a more simplistic nature than they are in a full Nets structure. Despite this it is hoped that this introduction has given an idea of the power that Nets can bring to a project.
Tod Golding ( 2005 ) Professional .Net 2.0 Generics
Nicholai M. Josuttis ( 1999 ) The C++ Standard Library
Tom Archer ( 2001 ) Inside C#, Microsoft Press
Jeffrey Richter ( 2002 ) Applied Microsoft .NET Framework Programming, Microsoft Press
Charles Peltzold ( 2002 ) Programming Microsoft Windows With C#, Microsoft Press
Robinson et al ( 2001 ) Professional C#, Wrox
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 13 Sep 2007 Editor: |
Copyright 2007 by pseudonym67 Everything else Copyright © CodeProject, 1999-2009 Web10 | Advertise on the Code Project |