Click here to Skip to main content
Email Password   helpLost your password?

Introduction

Without giving a Computer Science 101 lecture, I'll provide you with 2 pictures that says a thousand words. In short, DFA stands for Deterministic Finite Automation (or Automata). It's cousin is NFA, non-deterministic FA. An example of a NFA is looping through a big list looking for a word. The word might be found quickley, or it might only find it in the last entry. This is said to be non-deterministic. So we can then say for a DFA we will know exactly how long a search for a word (as per the NFA example) will take.

Data Samples

The "raw" data (in no particular order) ready for NFA processing (you know the lazy way).

TABLE A
leppie can swim 
leppie can talk 
leppie looks good 
leppie looks left 
leppie goes home 
john looks fine 
john can walk 
john wants nothing 
leppie wants food 
leppie can code 
leppie can code C 
leppie can code C# 
leppie can code C# well 
leppie can code C++ poorly 
leppie lives in stellenbosch 
leppie lives with a flatmate 
leppie drinks plenty coke 
leppie drinks coffee 
leppie eats many hotdogs 
leppie eats food 
leppie knows alot 
leppie knows alot about code 
leppie knows a little about girls 
john can swim 
john can talk too
TABLE B
stop
stopper
stood
step
standard
stank
stance
authors
automatic
back
back's
backed
background
backing
backing's
backs
backwards
bad
badly
balance
balance's
ball
ball's
chain
chain's
chair
chair's
chairman
chairman's
chance
chance's
chances
change
changed
changes
changing
channel
channel's
channels
chaos
chaos's
chapter
chapter's
char
charge
charged
charges
fall
fallen
falling
falls
false
familiar
family
family's
famous
fan
fan's
fancy
far
farm

DFA representation of Table A. A circle with an outer circle denotes an endstate.

DFA representation of Table B. A circle with an outer circle denotes an endstate.

You might ask what that System.Object is doing there, well nothing. UPDATE: I have now removed the root object from the generated graph. As you can see, it is not an endstate, so it will not affect anything. But it does serve a purpose. It acts as the very first node that is created regardless of the key type. I can remove it, but then I would have to check for null everytime and that would be too costly. So dont worry about it!

Base Implementation: AtomicState

public class AtomicState
{
	protected AtomicState();
	protected AtomicState(object key);
	protected bool Accepts(Array stack);
	public Array AcceptStates(Array stack);
	protected bool Add(Array stack);
	public AtomicState GetAtomicStateAt(Array stack);
	protected Array Match(Array stack);
	protected bool Remove(Array stack);
	internal protected virtual void RenderStateNodeAttributes attr);
	internal protected virtual void RenderTransistion(EdgeAttributes attr);
	public override string ToString();
	public int AcceptStateCount { get; }
	public int TotalStateCount { get; }
	protected object key;
}

This base implementation for the DFA state machine. I have provided several implementations for typesafe Int32, String, Char, Boolean, Float, Object and Type. Feel free to add your own and see comments I have meda in the code. Arrays of Arrays have terrible casting issues. I have also included a Combination class that I did many moons back, and never had a place to put. Combinations are very good to build DFA's. Also included is a utiliy class to generate all these pretty pictures (yes, this is aimed directly at you, Marc Clifton ;P) you can see here (you will need GraphViz for this). UPDATE: The GraphViz utilities has been greatly improved, and from above you can see you can now specify rendering options on a State level. The implementation is similar to ASP.NET's custom control Rendering. In other words, if you need to change something, just override RenderXXX. See TypeState's implementation for a good example.

The class consists mostly of non-human comprehensable code (almost every function is either running in a while(true) loop or recusively). This is most mostly a generic port of a C project that was done for Computer Science. From the 2 count functions's code, you will see the "multi dimensional linked list" is rather easily "walkable" with recursion (I had some stack issues, but havent been able to replicate them).

Purpose

So you may ask yourself why or how do I use this? The way the machine has been setup, is to take a "path" at a time, so adding a sentence or a word, get automatically laid out in the machine. No other intervention is required. One of the most difficult things to do is changing a NFA to a DFA, this is all done for you. Then all you need to do is query it. Some functions like Match() can take longer with "wildcards" (set these up as null, see the implementations).

You might ask me now why I dont just use an ArrayList or a HashTable? Simple. An ArrayList is good for batches of data, but poor at searching. A HashTable is excellent for lookups, but fails to be useful for pattern matching or "path" finding. In fact you should be using either of the afore mentioned in most cases. Examples of what this is useful for:

Performance

As with any DFA performance is key. This implementation is comparable to a Hashtable. In fact, if boxing was removed, lookups would be even nearer (2x instead of 3x in my experiments). Here is a simple console output from a 500 000 wordlist loaded in a char[]:

Extracting all entries
Time: 5752.794559ms
CharState Testing 10000 random lookups from 502069 entries
Time: 133.561947ms
Avg time: 0.013356ms
HashTable Testing 10000 random lookups from 502069 entries
Time: 37.712056ms
Avg time: 0.003771ms

As one can see, lookups are 3 times slower than a Hashtable, but remember that a HashTable is incredibly fast, but does not have the same characteristics. Note the slowness of extracting all the entries. Not bad if you think that the DFA is converted to raw data again.

Thanks to the person who wrote the HP timer class I used for the timing (sorry, cant remember your name now).

Walkthrough

Say for example you wanted a quick inheritance graph of an assembly. You would simple do this:

TypeState typeroot = new TypeState();
Type[] types = Assembly.GetAssembly(typeof(AtomicState)).GetTypes();

foreach (Type type in types)
{
   ArrayList arr = new ArrayList();
   Type etype = type;
   do 
   {
      arr.Insert(0, etype);
      etype = etype.BaseType;
   }
   while (etype != null);

   typeroot.Add( (Type[]) arr.ToArray(typeof(Type)));
}
TextWriter writer = File.OpenText("file.dot");
GraphViz.Generate(typeroot, writer);

Now we have a string we can pass to Dot (the GraphViz executable) and we end up with:

Pretty isn't it? But hey, that isnt even the purpose! Luckily GraphViz fits like a glove and its making many more things possible. If any one wants to send some network trace logs, I'll put up some renderings. UPDATE: Here they are! I have cropped the graph somewhat.

Observations

A question might arise why things like "leppie can eat" and "john can eat" does flow back into each other. This is infact possible, but a problem arises when you add "leppie can eat meat". Does this mean john can eat meat too? It depends how accurate you want to be. In my case, I have opted for accuracy. Adding this facility would only increase load time and to make it 100% accurate will require an NFA pass or a major change to the algorhythm. Suggestions are welcome.

Conclusion

I hope you enjoy using this, and feel free to comment as usual. The test project, basically contains just some tests I have run, and will probably cause an exception on your system. Look at it however to see how its meant to be used.

Changelog

1.1

1.2

Future Plans

Now, so all of you will come back and back again, I'm listing some plans I have for the future (viable suggestions will be added too):

 

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralSource code of dot dll
kbutter
22:27 24 Feb '08  
I´m facing a problem with the loader lock in visual studio 2005 using the dot.dll.
Is it possible to get the source code of the dot.dll so I can try some changes?

Kathrin
GeneralQuestion about dot.dll
kbutter
5:43 6 Jun '07  
I´m using QuickGraph and dot.dll for visualizing a directed graph on an asp.net webpage. Now my question is: Is there a possibility to write the graph into a bytestream instead of storing it in a file? To directly display it on the webpage?
Or does anyone know another graph visualization tool supporting this?

GeneralFont problems [modified]
david_k13
10:26 16 Aug '06  
Hi

I'm using QuickGraph.Algorithms.Graphviz.DotRenderer, which depends on DOT.DLL, and am having trouble with the fonts.

If I render a graph through DotRenderer.Render(dot_input, imagetype), and ask for either a JPG or GIF, the font of the text always comes out as System (or some blocky font that looks like it).

If I ask for an SVG image, though, no problem. The SVG text comes out right, and the Adobe plugin shows me the right font (I tried both Arial and TimesNewRoman).

Any ideas? I'm currently using QuickGraph.Algorithms 2.1.1441.30422 and DOT.DLL v0.0.0.0 (no version number available, but the file is dated May 17 2003). Maybe there are newer versions?

Thx,


.DaviD.


-- modified at 15:27 Wednesday 16th August, 2006
GeneralNot code parser
Saikat Sen
17:18 14 Jan '04  
Your code looks good, but you cannot have a code parser done using an NFA or DFA (as you mentioned) unless the code is in ane extremely simple language. NFAs serve regular languages, you need a Pushdown Automaton for parsing context-free languages.

GeneralRe: Not code parser
leppie
7:28 15 Jan '04  
Saikat Sen wrote: but you cannot have a code parser done using an NFA or DFA (as you mentioned) unless the code is in ane extremely simple language. NFAs serve regular languages, you need a Pushdown Automaton for parsing context-free languages.
I suggest you do some reading on parsing techniques. (search for Dick Grune, has an excellent book). There are countless techniques.

OTH this (my article) is just a simple conversion. I have an fact use it to write a parser that takes input similar to yacc.

leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Not code parser
Saikat Sen
9:12 15 Jan '04  
As I said, there are no parsing techniques to parse CFLs or CSLs using NFAs. NFAs at best can be used to parse RLs.

I suggest that you do some reading on parsing techniques. "Introduction to Automata Theory, Languages, and Computation (2nd Edition)" by Hopcroft is a good book. Or read any book on compilers.

GeneralRe: Not code parser
leppie
9:32 15 Jan '04  
Saikat Sen wrote: As I said, there are no parsing techniques to parse CFLs or CSLs using NFAs. NFAs at best can be used to parse RLs.
I mean what did you smoke? You should know that any problem can be solved with an NFA. Recursive descent & Earley parsers are NFA. Tomita uses a combination of NFA/DFA. D'Oh!



leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Not code parser
BenDi
4:13 30 Mar '04  
Sorry, but Sen is right there... It is simply not possible to determine with a NFA if a string fits, for instance, the sceme {a^nb^n | n == 0..}. You need a Chomsky-Level-2 automaton for that.
GeneralRe: Not code parser
wish_i_know_so_much
18:28 19 Feb '05  
I hate you all Wink
GeneralRe: Not code parser
Anonymous
10:01 11 Mar '05  
With this technique it should be possible to parse LL(1) languages. These languages are not very complex, but still...
GeneralRe: Not code parser
omoshima
13:50 6 Apr '05  
That is such a contrived exampleSniff ... But, I have to agree that you cannot do that with an NFA.
GeneralDot.dll source ?
Jonathan de Halleux
21:44 8 Dec '03  
Hi Leppie,

Could I have a look at the dot.dll source. I want to solve the "file opened" problem (dot.dll keeps a handle on ALL the files it opens, therefore it leads to some problems on web apps).

Thanks.

Jonathan de Halleux.
www.dotnetwiki.org
GeneralRe: Dot.dll source ?
leppie
8:18 9 Dec '03  
Hi Jon

I looked for my source, but it seems its gone > /dev/null D'Oh!

I'll make a point to look iuf i can compile graphwiz easily.

Btw did u ever track down that graphviz COM component? I should have that around somewhere still.

leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
leppie
8:20 9 Dec '03  
The COM component.

http://home.so-net.net.tw/oodtsen/wingraphviz/index.htm[^]

leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
leppie
11:15 9 Dec '03  
OK, I have made up a compilable graphviz, there are many linker errors that I just dont have the power/will for. If you are interested, I can upload it to you. It's just over 10mb compressed. My VisualC++ knowledge is a bit scrappy...

leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
Jonathan de Halleux
2:26 10 Dec '03  
I don't want to mess with the COM component: major problem is that it needs to be registered and this becomes difficult on remote servers.

How could the transfer be done ?

Jonathan de Halleux.
www.dotnetwiki.org
GeneralRe: Dot.dll source ?
leppie
7:32 10 Dec '03  
Jonathan de Halleux wrote: How could the transfer be done ?
I could upload to an anonymous ftp. It will be on there for a week.

ftp://ftp.uunet.co.za/pub/incoming/dot/[^]

leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
Jonathan de Halleux
2:06 13 Dec '03  
Got it ! I'll keep you up-to-date.

Jonathan de Halleux.
www.dotnetwiki.org
GeneralRe: Dot.dll source ?
leppie
2:27 13 Dec '03  
I just realised that paths in the project file contains hard links Hmmm

My root dir was: e:\incoming\graphviz-1.10\

so if u dont have an e: drive just use subst, or replace the locations. Stupid VS.NET!

ANyways good luck Smile



leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
Jonathan de Halleux
3:54 18 Dec '03  
Should be doable. Thanks!

Jonathan de Halleux.
www.dotnetwiki.org
GeneralRe: Dot.dll source ?
Jonathan de Halleux
0:03 21 Dec '03  
Could not find any .sln file ???

Jonathan de Halleux.
www.dotnetwiki.org
GeneralRe: Dot.dll source ?
leppie
0:28 21 Dec '03  
its quite far down some directory structure (who knows why?), just do a search, there is only one sln file. Smile


leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
Jonathan de Halleux
2:01 21 Dec '03  
I did but could not find any. Anyway, this is fixed now and I've also replaced the absolute include path by relative path.

I'm facing two problems: First a linking problem on StaticDll
------ Build started: Project: StaticDll, Configuration: Debug Win32 ------

Compiling...
mccp.cpp
Linking...
LINK : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/INCREMENTAL:NO' specification
mccp.obj : error LNK2001: unresolved external symbol "void __cdecl dot_init_graph(struct Agraph_t *)" (?dot_init_graph@@$$FYAXPAUAgraph_t@@@Z)
Debug/StaticDll.dll : fatal error LNK1120: 1 unresolved externals

and second where's the managed C++ wrapper ?


Jonathan de Halleux.
www.dotnetwiki.org
GeneralRe: Dot.dll source ?
leppie
2:18 21 Dec '03  
Jonathan de Halleux wrote: I'm facing two problems: First a linking problem on StaticDll
Hehe, originally it took me about 2 days to sort out the linker errors, blame MS (there will be many more after that one). If I remeber correctly the source seems to be duplicated in place. So good luck Smile


Jonathan de Halleux wrote: and second where's the managed C++ wrapper ?
There is none, it was simply a MC++ class calling the cmdline. U can see what I did, by inspecting the IL of dot.dll, to be honest I cant really remember what/how I did it!



leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Dot.dll source ?
Jonathan de Halleux
22:36 21 Dec '03  
leppie wrote: Hehe, originally it took me about 2 days to sort out the linker errors, blame MS (there will be many more after that one). If I remeber correctly the source seems to be duplicated in place. So good luck
I managed to avoid this problem by creating a mini-wrapper function that hides all graphviz headers... niark niark.

But, hey, the code is really crappy! It's full of static variable in global scope and functions! Anyway, I hope to get things running and have publish the wrapper soon on the codeproject. I'll try to wrap neato, lneato, lefty,etc... as well.

Dawn static variables!


Jonathan de Halleux.
www.dotnetwiki.org


Last Updated 22 May 2003 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010