Click here to Skip to main content
Licence 
First Posted 8 Feb 2003
Views 102,623
Bookmarked 28 times

WinSpider - The Windows WebCrawler Application

By | 3 Feb 2005 | Article
Web leaching utility devoloped in C# - This is a front end named WinSpider, This application uses "wget" in backend for "crawling" operation. It impliments a simple, parellel method of interprocess communication.

 

Sample Image - cp_ws.gif

Introduction

This application can be used to leach a url contents and it subdirectories(optional)

This will work behind firewall and have capabilty to minimize to system try. The progress will update in the status window (yellow)

For allowing url input Iam using a url combo box featuring history.

The back end of this utility is wget (Open source project ), You can get its latest from http://www.wget.org

 

Open Issues

Some one commented that the leached directories are getting deleted from the current folder. ( This is the tempory directory created, where will be the files get leached at first).

Then copied to the specified directory. ( you can see them on leaching in temporary directory)

Please remove the directory removing section to keep both the contents, so that later on you can get updated version of the url faster ).

The leach code looks like this

This uses a parellel way to interprocess communication ;-)

  void StartLeach()
  {
   if(urlComboAddress.Text.ToLower() == "http://"
    || urlComboAddress.Text.ToLower() == "ftp://")
   {
    MessageBox.Show("Please specify an http:// or ftp:// site location.", "Error");
    return;
   }
   
   if(cCheckEnableProxy.Checked)     
   {
    if( cTextServer.Text == ""
     ||cTextUser.Text == ""
     || cTextPass.Text == ""
     || cTextPort.Text == "" )
    {
     MessageBox.Show("Please specify correct proxy server, port, username and password.", "Error");
     return;
    }
   }

 
   if(! Directory.Exists(cOutFolder.Text))
   {
    MessageBox.Show("Directory does not exists");
    return;
   }
 
   MenuStart.Enabled = false;
   MenuCancel.Enabled = true;
   strOutPath = cOutFolder.Text;
   cTextOut.Clear();
 
   String strBatch = "wget.exe ";
   if(cCheckEnableProxy.Checked)
   {
    strBatch += " --proxy-user="
     + cTextUser.Text
     + " --proxy-pass=" + cTextPass.Text
     + " -e http_proxy=" + cTextServer.Text
     + ":"+ cTextPort.Text;
   }
 
    
   if(cCheckRecursive.Checked)
   {
    strBatch += " -r ";
 
    if(cCheckChildOnly.Checked)
    {
     strBatch += " -np ";
     if(cCheckSiblings.Checked)
     {
      strBatch+= " -l 1 "; // one level
     }
     else
     {
      strBatch+= " -l 0 "; //infinite levels
     }
 
    }
   }
 
   // time stambing -N
   // -P prefix
   //
   strBatch
    += " -o out.cap -N --passive-ftp -x -N -P"
    + strOutPath
    + " "
    + urlComboAddress.Text ;
    
   strBatch+= " ";
 
   String filename="wget.bat";
   if(File.Exists(filename))
   {
    File.Delete(filename);
   }
 
   StreamWriter file = File.CreateText(filename);
   file.WriteLine(strBatch);
   file.Close();
   
   myProcess = new Process();
   myProcess.StartInfo.FileName = filename;
   myProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
   myProcess.StartInfo.RedirectStandardOutput = false;
   myProcess.StartInfo.UseShellExecute = true;
   myProcess.StartInfo.CreateNoWindow = true;
   
   try
   {
    cButtonLeach.Enabled = false;
    cTimerUpdate.Enabled = true;
    myProcess.Start();
   }
   catch(Exception eProc)
   {
    MessageBox.Show(eProc.Message);
   }
  }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

noushadkc

Web Developer

India India

Member

Now working with NeST technologies - a major software firm with global presence(CMM5 and towards Six Sigma).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralYou dumb f-ck Pinmembernotadotyet6:48 7 Jan '05  
GeneralRe: You dumb f-ck Pinmembernoushadkc20:55 7 Jan '05  
GeneralRe: You dumb f-ck Pinmemberstephan johnson1:40 18 Oct '05  
i know this comes a bit late, since these comments were more than 6 months ago,
 
however i cannot help notice how many people are dissing this particular article.
 
i must say, is this community not for sharing of code, any code?
 
we are all supposed to be here to help each other out a bit, this is exactly why code project is as successful as what it is.
 
notadotyet: you say that projects like this are a waste of space on this website, i don't see that you have posted any (let me repeat myself Articles Submitted 0) on this website. why don't you post an article and email me so that i can put a comment there describing what i think of your coding abilities.
 
marc cliff as well, except that he has posted 82 (some very useful) articles on this site.
 
i have come accross many articles that are baren in terms of the body, but the code is still interesting and sometimes (be it seldom) useful.
 
I think we must all revise our attitudes towards others sharing their idea's or else this whole website might just as well be turned off.
 
i am in the least highly unimpressed with the remarks in this article.
 
dude, whoever wrote the article, good job, even though it uses wget, i agree that you should try to look at the webclient classes. however having tried this myself, knowing that 99% of all web pages are non-well formed, i realise that why try to re-invent the wheel if you could just make another hub-cap for it.?
 
stephan johnson
 
think before you type
GeneralAny Updates Pinmemberwrussell13:59 1 Feb '04  
GeneralRe: Any Updates PinsussAnonymous17:36 1 Feb '04  
GeneralRe: Any Updates Pinmemberwrussell3:25 2 Feb '04  
GeneralRe: Any Updates Pinmemberwrussell8:40 2 Feb '04  
QuestionAnd the changes were??? Pinmemberfifi12:26 29 Apr '03  
Generalanother non-creative non-sense article Pinsussprogrammer2003++9:29 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberOmegaSupreme10:21 29 Apr '03  
GeneralRe: another non-creative non-sense article Pinsussa reader17:24 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberOmegaSupreme17:29 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberEd Din ar Qadiyyeh21:03 20 May '03  
GeneralSuch a nifty title... PinmemberMarc Clifton6:55 10 Feb '03  
Generalwww.wget.org Pinmembernoushadkc1:16 10 Feb '03  
Generalwww.wget.com Pinmemberleppie23:21 8 Feb '03  
GeneralRe: www.wget.com PinmemberJeff J8:20 9 Feb '03  
QuestionRobots.txt? PinmemberJörgen Sigvardsson22:52 8 Feb '03  
AnswerRe: Robots.txt? Pinmemberleppie23:25 8 Feb '03  
GeneralRe: Robots.txt? PinmemberJörgen Sigvardsson0:51 9 Feb '03  
AnswerRe: Robots.txt? Pinmembernoushadkc15:46 10 Feb '03  
GeneralWGET?!?! PinmemberDaniel Turini21:48 8 Feb '03  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120529.1 | Last Updated 4 Feb 2005
Article Copyright 2003 by noushadkc
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid