Domain Walker






4.63/5 (14 votes)
An object that allows you to explore the topology of the internet.
What is it?
![]() |
DomainWalker is an object that discovers domains reachable from a URL. Unlike traditional crawlers and site downloaders that identify all reachable URLs on a page, DomainWalker explores a subset of the world wide web's topology by targeting root URLs only. DomainWalker guarantees that its walk will complete in a finite amount of time by ensuring that duplicate domains are never crawled.
|
How do I use it?
You use DomainWalker
by initializing it, calling its Walk()
method, and getting its results.
- Initialize the
DomainWalker
instance// Initialize the DomainWalker DomainWalker dw = new DomainWalker(); dw.StartUrl = "www.ravib.com"; dw.MaxDepth = 3;
- Do the walk
// Do walk dw.walk();
- Get the results
// Get results HashTable domainTree = dw.DomainTree; printHashTableAsTree (domainTree); // left as an exercise to the reader
Getting DomainWalker's results
You retrieve DomainWalker
's results by accessing its DomainTree
property at the end of the walk and/or responding to the OnNotifyUrlBeingTraversed
event.
DomainTree property
DomainWalker
's result is a tree of discovered domains obtained from the object's DomainTree
property. The tree is actually a nested Hashtable
, where each collection of child nodes is stored in a new Hashtable
.

OnNotifyUrlBeingTraversed event
It may be more convenient to get at DomainWalker
's results by being notified every time a new URL is discovered. This is done by subscribing to the object's OnNotifyUrlBeingTraversed
event and is the approach taken by the demo app. Domain discovery notifications are received by registering a OnNotifyUrlBeingTraversed
delegate which has the following signature:
/// <summary> /// Notifies an observer when a url is about to be traversed. /// </summary> /// <param name="strParentUrl">The parent url (may be null).</param> /// <param name="strUrlBeingTraversed">The url being traversed.</param> /// <param name="nCurrentDepth">Current traversal depth.</param> /// <param name="nDomains">Number of domains discovered so far.</param> /// <param name="tsElapsed">Time elapsed since start of crawl.</param> public delegate void OnNotifyUrlBeingTraversed (string strParentUrl, string strUrlBeingTraversed, int nCurrentDepth, int nDomains, TimeSpan tsElapsed);
The demo app responds to the OnNotifyUrlBeingTraversed
event by adding strUrlBeingTraversed
to a list box. The string is indented by an appropriate number of spaces proportional to nCurrentDepth
. Other useful information such as the elapsed walk time (tsElapsed
) is displayed in a label control.
OnNotifyWalkCompleted event
DomainWalker
also fires the OnNotifyWalkCompleted
event at the end of a walk. The OnNotifyWalkCompleted
delegate has the following signature:
/// <summary> /// Notifies an observer when the walk has completed. /// </summary> /// <param name="nDomains">Number of domains discovered.</param> /// <param name="tsElapsed">Time taken to complete crawl.</param> public delegate void OnNotifyWalkCompleted (int nDomains, TimeSpan tsElapsed);
Revision History
- 22 Jan 2006
- Corrected
DomainWalkerForm
delegates to ensure controls are accessed from the GUI thread. (Thanks, Birgir K!) - Added missing
.resx
file to project. - Upgraded project to VS2005.
- Corrected
- 15 Jan 2006
Initial version.