Click here to Skip to main content
6,822,613 members and growing! (19,718 online)
Email Password   helpLost your password?
Platforms, Frameworks & Libraries » .NET Framework » General     Advanced

WinSpider - The Windows WebCrawler Application

By noushadkc

Web leaching utility devoloped in C# - This is a front end named WinSpider, This application uses "wget" in backend for "crawling" operation. It impliments a simple, parellel method of interprocess communication.
C#, Windows, .NET1.0, Dev, QA
Posted:8 Feb 2003
Updated:3 Feb 2005
Views:84,893
Bookmarked:23 times
Unedited contribution
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
27 votes for this article.
Popularity: 2.25 Rating: 1.57 out of 5
20 votes, 74.1%
1
1 vote, 3.7%
2
1 vote, 3.7%
3

4
5 votes, 18.5%
5

 

Sample Image - cp_ws.gif

Introduction

This application can be used to leach a url contents and it subdirectories(optional)

This will work behind firewall and have capabilty to minimize to system try. The progress will update in the status window (yellow)

For allowing url input Iam using a url combo box featuring history.

The back end of this utility is wget (Open source project ), You can get its latest from http://www.wget.org

 

Open Issues

Some one commented that the leached directories are getting deleted from the current folder. ( This is the tempory directory created, where will be the files get leached at first).

Then copied to the specified directory. ( you can see them on leaching in temporary directory)

Please remove the directory removing section to keep both the contents, so that later on you can get updated version of the url faster ).

The leach code looks like this

This uses a parellel way to interprocess communication ;-)

  void StartLeach()
  {
   if(urlComboAddress.Text.ToLower() == "http://"
    || urlComboAddress.Text.ToLower() == "ftp://")
   {
    MessageBox.Show("Please specify an http:// or ftp:// site location.", "Error");
    return;
   }
   
   if(cCheckEnableProxy.Checked)     
   {
    if( cTextServer.Text == ""
     ||cTextUser.Text == ""
     || cTextPass.Text == ""
     || cTextPort.Text == "" )
    {
     MessageBox.Show("Please specify correct proxy server, port, username and password.", "Error");
     return;
    }
   }

 
   if(! Directory.Exists(cOutFolder.Text))
   {
    MessageBox.Show("Directory does not exists");
    return;
   }
 
   MenuStart.Enabled = false;
   MenuCancel.Enabled = true;
   strOutPath = cOutFolder.Text;
   cTextOut.Clear();
 
   String strBatch = "wget.exe ";
   if(cCheckEnableProxy.Checked)
   {
    strBatch += " --proxy-user="
     + cTextUser.Text
     + " --proxy-pass=" + cTextPass.Text
     + " -e http_proxy=" + cTextServer.Text
     + ":"+ cTextPort.Text;
   }
 
    
   if(cCheckRecursive.Checked)
   {
    strBatch += " -r ";
 
    if(cCheckChildOnly.Checked)
    {
     strBatch += " -np ";
     if(cCheckSiblings.Checked)
     {
      strBatch+= " -l 1 "; // one level
     }
     else
     {
      strBatch+= " -l 0 "; //infinite levels
     }
 
    }
   }
 
   // time stambing -N
   // -P prefix
   //
   strBatch
    += " -o out.cap -N --passive-ftp -x -N -P"
    + strOutPath
    + " "
    + urlComboAddress.Text ;
    
   strBatch+= " ";
 
   String filename="wget.bat";
   if(File.Exists(filename))
   {
    File.Delete(filename);
   }
 
   StreamWriter file = File.CreateText(filename);
   file.WriteLine(strBatch);
   file.Close();
   
   myProcess = new Process();
   myProcess.StartInfo.FileName = filename;
   myProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
   myProcess.StartInfo.RedirectStandardOutput = false;
   myProcess.StartInfo.UseShellExecute = true;
   myProcess.StartInfo.CreateNoWindow = true;
   
   try
   {
    cButtonLeach.Enabled = false;
    cTimerUpdate.Enabled = true;
    myProcess.Start();
   }
   catch(Exception eProc)
   {
    MessageBox.Show(eProc.Message);
   }
  }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

noushadkc


Member
Now working with NeST technologies - a major software firm with global presence(CMM5 and towards Six Sigma).
Occupation: Web Developer
Location: India India

Other popular .NET Framework articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 22 of 22 (Total in Forum: 22) (Refresh)FirstPrevNext
GeneralYou dumb f-ck Pinmembernotadotyet7:48 7 Jan '05  
GeneralRe: You dumb f-ck Pinmembernoushadkc21:55 7 Jan '05  
GeneralRe: You dumb f-ck Pinmemberstephan johnson2:40 18 Oct '05  
GeneralAny Updates Pinmemberwrussell14:59 1 Feb '04  
GeneralRe: Any Updates PinsussAnonymous18:36 1 Feb '04  
GeneralRe: Any Updates Pinmemberwrussell4:25 2 Feb '04  
GeneralRe: Any Updates Pinmemberwrussell9:40 2 Feb '04  
GeneralAnd the changes were??? Pinmemberfifi13:26 29 Apr '03  
Generalanother non-creative non-sense article Pinsussprogrammer2003++10:29 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberOmegaSupreme11:21 29 Apr '03  
GeneralRe: another non-creative non-sense article Pinsussa reader18:24 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberOmegaSupreme18:29 29 Apr '03  
GeneralRe: another non-creative non-sense article PinmemberEd Din ar Qadiyyeh22:03 20 May '03  
GeneralSuch a nifty title... PinmemberMarc Clifton7:55 10 Feb '03  
Generalwww.wget.org Pinmembernoushadkc2:16 10 Feb '03  
Generalwww.wget.com Pinmemberleppie0:21 9 Feb '03  
GeneralRe: www.wget.com PinmemberJeff J9:20 9 Feb '03  
GeneralRobots.txt? PinmemberJörgen Sigvardsson23:52 8 Feb '03  
GeneralRe: Robots.txt? Pinmemberleppie0:25 9 Feb '03  
GeneralRe: Robots.txt? PinmemberJörgen Sigvardsson1:51 9 Feb '03  
GeneralRe: Robots.txt? Pinmembernoushadkc16:46 10 Feb '03  
GeneralWGET?!?! PinmemberDaniel Turini22:48 8 Feb '03  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.

PermaLink | Privacy | Terms of Use
Last Updated: 3 Feb 2005
Editor:
Copyright 2003 by noushadkc
Everything else Copyright © CodeProject, 1999-2010
Web10 | Advertise on the Code Project