Introduction
Without giving a Computer Science 101 lecture, I'll provide you with 2 pictures that says a thousand words. In short, DFA stands for Deterministic Finite Automation (or Automata). It's cousin is NFA, non-deterministic FA. An example of a NFA is looping through a big list looking for a word. The word might be found quickley, or it might only find it in the last entry. This is said to be non-deterministic. So we can then say for a DFA we will know exactly how long a search for a word (as per the NFA example) will take.
Data Samples
The "raw" data (in no particular order) ready for NFA processing (you know the lazy way).
TABLE A
leppie can swim
leppie can talk
leppie looks good
leppie looks left
leppie goes home
john looks fine
john can walk
john wants nothing
leppie wants food
leppie can code
leppie can code C
leppie can code C#
leppie can code C# well
leppie can code C++ poorly
leppie lives in stellenbosch
leppie lives with a flatmate
leppie drinks plenty coke
leppie drinks coffee
leppie eats many hotdogs
leppie eats food
leppie knows alot
leppie knows alot about code
leppie knows a little about girls
john can swim
john can talk too
TABLE B
stop
stopper
stood
step
standard
stank
stance
authors
automatic
back
back's
backed
background
backing
backing's
backs
backwards
bad
badly
balance
balance's
ball
ball's
chain
chain's
chair
chair's
chairman
chairman's
chance
chance's
chances
change
changed
changes
changing
channel
channel's
channels
chaos
chaos's
chapter
chapter's
char
charge
charged
charges
fall
fallen
falling
falls
false
familiar
family
family's
famous
fan
fan's
fancy
far
farm
DFA representation of Table A. A circle with an outer circle denotes an endstate.


DFA representation of Table B. A circle with an outer circle denotes an endstate.

You might ask what that System.Object is doing there, well nothing. UPDATE: I have now removed the root object from the generated graph. As you can see, it is not an endstate, so it will not affect anything. But it does serve a purpose. It acts as the very first node that is created regardless of the key type. I can remove it, but then I would have to check for null everytime and that would be too costly. So dont worry about it!
Base Implementation: AtomicState
public class AtomicState
{
protected AtomicState();
protected AtomicState(object key);
protected bool Accepts(Array stack);
public Array AcceptStates(Array stack);
protected bool Add(Array stack);
public AtomicState GetAtomicStateAt(Array stack);
protected Array Match(Array stack);
protected bool Remove(Array stack);
internal protected virtual void RenderStateNodeAttributes attr);
internal protected virtual void RenderTransistion(EdgeAttributes attr);
public override string ToString();
public int AcceptStateCount { get; }
public int TotalStateCount { get; }
protected object key;
}
This base implementation for the DFA state machine. I have provided several implementations for typesafe Int32, String, Char, Boolean, Float, Object and Type. Feel free to add your own and see comments I have meda in the code. Arrays of Arrays have terrible casting issues. I have also included a Combination class that I did many moons back, and never had a place to put. Combinations are very good to build DFA's. Also included is a utiliy class to generate all these pretty pictures (yes, this is aimed directly at you, Marc Clifton ;P) you can see here (you will need GraphViz for this). UPDATE: The GraphViz utilities has been greatly improved, and from above you can see you can now specify rendering options on a State level. The implementation is similar to ASP.NET's custom control Rendering. In other words, if you need to change something, just override RenderXXX. See TypeState's implementation for a good example.
The class consists mostly of non-human comprehensable code (almost every function is either running in a while(true) loop or recusively). This is most mostly a generic port of a C project that was done for Computer Science. From the 2 count functions's code, you will see the "multi dimensional linked list" is rather easily "walkable" with recursion (I had some stack issues, but havent been able to replicate them).
Purpose
So you may ask yourself why or how do I use this? The way the machine has been setup, is to take a "path" at a time, so adding a sentence or a word, get automatically laid out in the machine. No other intervention is required. One of the most difficult things to do is changing a NFA to a DFA, this is all done for you. Then all you need to do is query it. Some functions like Match() can take longer with "wildcards" (set these up as null, see the implementations).
You might ask me now why I dont just use an ArrayList or a HashTable? Simple. An ArrayList is good for batches of data, but poor at searching. A HashTable is excellent for lookups, but fails to be useful for pattern matching or "path" finding. In fact you should be using either of the afore mentioned in most cases. Examples of what this is useful for:
- Spellchecker
- Word/code completion
- Code parser
- Path finding
- Any many more things most people only dream of
Performance
As with any DFA performance is key. This implementation is comparable to a Hashtable. In fact, if boxing was removed, lookups would be even nearer (2x instead of 3x in my experiments). Here is a simple console output from a 500 000 wordlist loaded in a char[]:
Extracting all entries
Time: 5752.794559ms
CharState Testing 10000 random lookups from 502069 entries
Time: 133.561947ms
Avg time: 0.013356ms
HashTable Testing 10000 random lookups from 502069 entries
Time: 37.712056ms
Avg time: 0.003771ms
As one can see, lookups are 3 times slower than a Hashtable, but remember that a HashTable is incredibly fast, but does not have the same characteristics. Note the slowness of extracting all the entries. Not bad if you think that the DFA is converted to raw data again.
Thanks to the person who wrote the HP timer class I used for the timing (sorry, cant remember your name now).
Walkthrough
Say for example you wanted a quick inheritance graph of an assembly. You would simple do this:
TypeState typeroot = new TypeState();
Type[] types = Assembly.GetAssembly(typeof(AtomicState)).GetTypes();
foreach (Type type in types)
{
ArrayList arr = new ArrayList();
Type etype = type;
do
{
arr.Insert(0, etype);
etype = etype.BaseType;
}
while (etype != null);
typeroot.Add( (Type[]) arr.ToArray(typeof(Type)));
}
TextWriter writer = File.OpenText("file.dot");
GraphViz.Generate(typeroot, writer);
Now we have a string we can pass to Dot (the GraphViz executable) and we end up with:

Pretty isn't it? But hey, that isnt even the purpose! Luckily GraphViz fits like a glove and its making many more things possible. If any one wants to send some network trace logs, I'll put up some renderings. UPDATE: Here they are! I have cropped the graph somewhat.

Observations
A question might arise why things like "leppie can eat" and "john can eat" does flow back into each other. This is infact possible, but a problem arises when you add "leppie can eat meat". Does this mean john can eat meat too? It depends how accurate you want to be. In my case, I have opted for accuracy. Adding this facility would only increase load time and to make it 100% accurate will require an NFA pass or a major change to the algorhythm. Suggestions are welcome.
Conclusion
I hope you enjoy using this, and feel free to comment as usual. The test project, basically contains just some tests I have run, and will probably cause an exception on your system. Look at it however to see how its meant to be used.
Changelog
1.1
- Totally redone GraphViz utility and added rendering on a State level. Both transition (edges) and state (node) attributes can be tweaked. I have kept the naming as near as possible to GraphViz's syntax for those that are fimiliar with the product.
- Uploaded new version with working tests (as you see them here) and pictures.
1.2
- Fix the performance bug.
- Made the pictures prettier :)
- Built a NET callable dll for Dot. This can be downloaded seperately. The compiling and linking and sorting out MC++ issues was a pain, but it finally works now. If you have problems I can send you a project file, but be warned, its a mess too.
Future Plans
Now, so all of you will come back and back again, I'm listing some plans I have for the future (viable suggestions will be added too):
- Ability to add state and transistion data, for the ultimate rendering experience.
- Perhaps, build a .NET wrapper for GraphViz (if someone can send me a solution file for the source, it would appreciated as the source is a mess). UPDATE : Built the dll, now for the wrapper. Has a small interface to call from .NET.
- I thought perhaps at a later stage I could implement the renderning GDI. Perhaps, later.
|
|
 |
 | Source code of dot dll kbutter | 22:27 24 Feb '08 |
|
 |
I´m facing a problem with the loader lock in visual studio 2005 using the dot.dll. Is it possible to get the source code of the dot.dll so I can try some changes?
Kathrin
|
|
|
|
 |
 | Question about dot.dll kbutter | 5:43 6 Jun '07 |
|
 |
I´m using QuickGraph and dot.dll for visualizing a directed graph on an asp.net webpage. Now my question is: Is there a possibility to write the graph into a bytestream instead of storing it in a file? To directly display it on the webpage? Or does anyone know another graph visualization tool supporting this?
|
|
|
|
 |
 | Font problems [modified] david_k13 | 10:26 16 Aug '06 |
|
 |
Hi
I'm using QuickGraph.Algorithms.Graphviz.DotRenderer, which depends on DOT.DLL, and am having trouble with the fonts.
If I render a graph through DotRenderer.Render(dot_input, imagetype), and ask for either a JPG or GIF, the font of the text always comes out as System (or some blocky font that looks like it).
If I ask for an SVG image, though, no problem. The SVG text comes out right, and the Adobe plugin shows me the right font (I tried both Arial and TimesNewRoman).
Any ideas? I'm currently using QuickGraph.Algorithms 2.1.1441.30422 and DOT.DLL v0.0.0.0 (no version number available, but the file is dated May 17 2003). Maybe there are newer versions?
Thx,
.DaviD.
-- modified at 15:27 Wednesday 16th August, 2006
|
|
|
|
 |
 | Not code parser Saikat Sen | 17:18 14 Jan '04 |
|
 |
Your code looks good, but you cannot have a code parser done using an NFA or DFA (as you mentioned) unless the code is in ane extremely simple language. NFAs serve regular languages, you need a Pushdown Automaton for parsing context-free languages.
|
|
|
|
 |
|
 |
Saikat Sen wrote:
but you cannot have a code parser done using an NFA or DFA (as you mentioned) unless the code is in ane extremely simple language. NFAs serve regular languages, you need a Pushdown Automaton for parsing context-free languages.
I suggest you do some reading on parsing techniques. (search for Dick Grune, has an excellent book). There are countless techniques.
OTH this (my article) is just a simple conversion. I have an fact use it to write a parser that takes input similar to yacc.
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
 |
As I said, there are no parsing techniques to parse CFLs or CSLs using NFAs. NFAs at best can be used to parse RLs.
I suggest that you do some reading on parsing techniques. "Introduction to Automata Theory, Languages, and Computation (2nd Edition)" by Hopcroft is a good book. Or read any book on compilers.
|
|
|
|
 |
|
 |
Saikat Sen wrote:
As I said, there are no parsing techniques to parse CFLs or CSLs using NFAs. NFAs at best can be used to parse RLs.
I mean what did you smoke? You should know that any problem can be solved with an NFA. Recursive descent & Earley parsers are NFA. Tomita uses a combination of NFA/DFA.
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
 |
Sorry, but Sen is right there... It is simply not possible to determine with a NFA if a string fits, for instance, the sceme {a^nb^n | n == 0..}. You need a Chomsky-Level-2 automaton for that.
|
|
|
|
 |
|
 |
I hate you all
|
|
|
|
 |
|
 |
With this technique it should be possible to parse LL(1) languages. These languages are not very complex, but still...
|
|
|
|
 |
|
 |
That is such a contrived example ... But, I have to agree that you cannot do that with an NFA.
|
|
|
|
 |
 | Dot.dll source ? Jonathan de Halleux | 21:44 8 Dec '03 |
|
 |
Hi Leppie,
Could I have a look at the dot.dll source. I want to solve the "file opened" problem (dot.dll keeps a handle on ALL the files it opens, therefore it leads to some problems on web apps).
Thanks.
Jonathan de Halleux.
www.dotnetwiki.org
|
|
|
|
 |
|
 |
Hi Jon
I looked for my source, but it seems its gone > /dev/null
I'll make a point to look iuf i can compile graphwiz easily.
Btw did u ever track down that graphviz COM component? I should have that around somewhere still.
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
|
 |
|
 |
OK, I have made up a compilable graphviz, there are many linker errors that I just dont have the power/will for. If you are interested, I can upload it to you. It's just over 10mb compressed. My VisualC++ knowledge is a bit scrappy...
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
 |
I don't want to mess with the COM component: major problem is that it needs to be registered and this becomes difficult on remote servers.
How could the transfer be done ?
Jonathan de Halleux.
www.dotnetwiki.org
|
|
|
|
 |
|
|
 |
|
|
 |
|
 |
I just realised that paths in the project file contains hard links
My root dir was: e:\incoming\graphviz-1.10\
so if u dont have an e: drive just use subst, or replace the locations. Stupid VS.NET!
ANyways good luck
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
|
 |
|
|
 |
|
 |
its quite far down some directory structure (who knows why?), just do a search, there is only one sln file.
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
 |
I did but could not find any. Anyway, this is fixed now and I've also replaced the absolute include path by relative path.
I'm facing two problems: First a linking problem on StaticDll
------ Build started: Project: StaticDll, Configuration: Debug Win32 ------
Compiling... mccp.cpp Linking... LINK : warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/INCREMENTAL:NO' specification mccp.obj : error LNK2001: unresolved external symbol "void __cdecl dot_init_graph(struct Agraph_t *)" (?dot_init_graph@@$$FYAXPAUAgraph_t@@@Z) Debug/StaticDll.dll : fatal error LNK1120: 1 unresolved externals
and second where's the managed C++ wrapper ?
Jonathan de Halleux.
www.dotnetwiki.org
|
|
|
|
 |
|
 |
Jonathan de Halleux wrote:
I'm facing two problems: First a linking problem on StaticDll
Hehe, originally it took me about 2 days to sort out the linker errors, blame MS (there will be many more after that one). If I remeber correctly the source seems to be duplicated in place. So good luck
Jonathan de Halleux wrote:
and second where's the managed C++ wrapper ?
There is none, it was simply a MC++ class calling the cmdline. U can see what I did, by inspecting the IL of dot.dll, to be honest I cant really remember what/how I did it!
leppie::AllocCPArticle("Zee blog"); Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.
|
|
|
|
 |
|
 |
leppie wrote:
Hehe, originally it took me about 2 days to sort out the linker errors, blame MS (there will be many more after that one). If I remeber correctly the source seems to be duplicated in place. So good luck
I managed to avoid this problem by creating a mini-wrapper function that hides all graphviz headers... niark niark.
But, hey, the code is really crappy! It's full of static variable in global scope and functions! Anyway, I hope to get things running and have publish the wrapper soon on the codeproject. I'll try to wrap neato, lneato, lefty,etc... as well.
Dawn static variables!
Jonathan de Halleux.
www.dotnetwiki.org
|
|
|
|
 |
|
|
Last Updated 22 May 2003 |
Advertise |
Privacy |
Terms of Use |
Copyright ©
CodeProject, 1999-2010