Click here to Skip to main content
Licence 
First Posted 8 Feb 2003
Views 102,622
Bookmarked 28 times

WinSpider - The Windows WebCrawler Application

By | 3 Feb 2005 | Article
Web leaching utility devoloped in C# - This is a front end named WinSpider, This application uses "wget" in backend for "crawling" operation. It impliments a simple, parellel method of interprocess communication.

 

Sample Image - cp_ws.gif

Introduction

This application can be used to leach a url contents and it subdirectories(optional)

This will work behind firewall and have capabilty to minimize to system try. The progress will update in the status window (yellow)

For allowing url input Iam using a url combo box featuring history.

The back end of this utility is wget (Open source project ), You can get its latest from http://www.wget.org

 

Open Issues

Some one commented that the leached directories are getting deleted from the current folder. ( This is the tempory directory created, where will be the files get leached at first).

Then copied to the specified directory. ( you can see them on leaching in temporary directory)

Please remove the directory removing section to keep both the contents, so that later on you can get updated version of the url faster ).

The leach code looks like this

This uses a parellel way to interprocess communication ;-)

  void StartLeach()
  {
   if(urlComboAddress.Text.ToLower() == "http://"
    || urlComboAddress.Text.ToLower() == "ftp://")
   {
    MessageBox.Show("Please specify an http:// or ftp:// site location.", "Error");
    return;
   }
   
   if(cCheckEnableProxy.Checked)     
   {
    if( cTextServer.Text == ""
     ||cTextUser.Text == ""
     || cTextPass.Text == ""
     || cTextPort.Text == "" )
    {
     MessageBox.Show("Please specify correct proxy server, port, username and password.", "Error");
     return;
    }
   }

 
   if(! Directory.Exists(cOutFolder.Text))
   {
    MessageBox.Show("Directory does not exists");
    return;
   }
 
   MenuStart.Enabled = false;
   MenuCancel.Enabled = true;
   strOutPath = cOutFolder.Text;
   cTextOut.Clear();
 
   String strBatch = "wget.exe ";
   if(cCheckEnableProxy.Checked)
   {
    strBatch += " --proxy-user="
     + cTextUser.Text
     + " --proxy-pass=" + cTextPass.Text
     + " -e http_proxy=" + cTextServer.Text
     + ":"+ cTextPort.Text;
   }
 
    
   if(cCheckRecursive.Checked)
   {
    strBatch += " -r ";
 
    if(cCheckChildOnly.Checked)
    {
     strBatch += " -np ";
     if(cCheckSiblings.Checked)
     {
      strBatch+= " -l 1 "; // one level
     }
     else
     {
      strBatch+= " -l 0 "; //infinite levels
     }
 
    }
   }
 
   // time stambing -N
   // -P prefix
   //
   strBatch
    += " -o out.cap -N --passive-ftp -x -N -P"
    + strOutPath
    + " "
    + urlComboAddress.Text ;
    
   strBatch+= " ";
 
   String filename="wget.bat";
   if(File.Exists(filename))
   {
    File.Delete(filename);
   }
 
   StreamWriter file = File.CreateText(filename);
   file.WriteLine(strBatch);
   file.Close();
   
   myProcess = new Process();
   myProcess.StartInfo.FileName = filename;
   myProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
   myProcess.StartInfo.RedirectStandardOutput = false;
   myProcess.StartInfo.UseShellExecute = true;
   myProcess.StartInfo.CreateNoWindow = true;
   
   try
   {
    cButtonLeach.Enabled = false;
    cTimerUpdate.Enabled = true;
    myProcess.Start();
   }
   catch(Exception eProc)
   {
    MessageBox.Show(eProc.Message);
   }
  }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

noushadkc

Web Developer

India India

Member

Now working with NeST technologies - a major software firm with global presence(CMM5 and towards Six Sigma).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralYou dumb f-ck Pinmembernotadotyet6:48 7 Jan '05  
GeneralRe: You dumb f-ck Pinmembernoushadkc20:55 7 Jan '05  
you can get from wget.org
 
Globe is still rotating.
GeneralRe: You dumb f-ck Pinmemberstephan johnson1:40 18 Oct '05  
GeneralAny Updates Pinmemberwrussell13:59 1 Feb '04  
GeneralRe: Any Updates PinsussAnonymous17:36 1 Feb '04  
GeneralRe: Any Updates Pinmemberwrussell3:25 2 Feb '04  
GeneralRe: Any Updates Pinmemberwrussell8:40 2 Feb '04  
QuestionAnd the changes were??? Pinmemberfifi12:26 29 Apr '03  
Generalanother non-creative non-sense article Pinsussprogrammer2003++9:29 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberOmegaSupreme10:21 29 Apr '03  
GeneralRe: another non-creative non-sense article Pinsussa reader17:24 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberOmegaSupreme17:29 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberEd Din ar Qadiyyeh21:03 20 May '03  
GeneralSuch a nifty title... PinmemberMarc Clifton6:55 10 Feb '03  
Generalwww.wget.org Pinmembernoushadkc1:16 10 Feb '03  
Generalwww.wget.com Pinmemberleppie23:21 8 Feb '03  
GeneralRe: www.wget.com PinmemberJeff J8:20 9 Feb '03  
QuestionRobots.txt? PinmemberJörgen Sigvardsson22:52 8 Feb '03  
AnswerRe: Robots.txt? Pinmemberleppie23:25 8 Feb '03  
GeneralRe: Robots.txt? PinmemberJörgen Sigvardsson0:51 9 Feb '03  
AnswerRe: Robots.txt? Pinmembernoushadkc15:46 10 Feb '03  
GeneralWGET?!?! PinmemberDaniel Turini21:48 8 Feb '03  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120528.1 | Last Updated 4 Feb 2005
Article Copyright 2003 by noushadkc
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid