65.9K
CodeProject is changing. Read more.
Home

WinSpider - The Windows WebCrawler Application

starIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIconemptyStarIcon

1.22/5 (22 votes)

Feb 9, 2003

2 min read

viewsIcon

130945

downloadIcon

2509

Web leaching utility devoloped in C# - This is a front end named WinSpider, This application uses "wget" in backend for "crawling" operation. It impliments a simple, parellel method of interprocess communication.

 

Sample Image - cp_ws.gif

Introduction

This application can be used to leach a url contents and it subdirectories(optional)

This will work behind firewall and have capabilty to minimize to system try. The progress will update in the status window (yellow)

For allowing url input Iam using a url combo box featuring history.

The back end of this utility is wget (Open source project ), You can get its latest from http://www.wget.org

 

Open Issues

Some one commented that the leached directories are getting deleted from the current folder. ( This is the tempory directory created, where will be the files get leached at first).

Then copied to the specified directory. ( you can see them on leaching in temporary directory)

Please remove the directory removing section to keep both the contents, so that later on you can get updated version of the url faster ).

The leach code looks like this

This uses a parellel way to interprocess communication ;-)

  void StartLeach()
  {
   if(urlComboAddress.Text.ToLower() == "http://"
    || urlComboAddress.Text.ToLower() == "ftp://")
   {
    MessageBox.Show("Please specify an http:// or ftp:// site location.", "Error");
    return;
   }
   
   if(cCheckEnableProxy.Checked)     
   {
    if( cTextServer.Text == ""
     ||cTextUser.Text == ""
     || cTextPass.Text == ""
     || cTextPort.Text == "" )
    {
     MessageBox.Show("Please specify correct proxy server, port, username and password.", "Error");
     return;
    }
   }

 
   if(! Directory.Exists(cOutFolder.Text))
   {
    MessageBox.Show("Directory does not exists");
    return;
   }
 
   MenuStart.Enabled = false;
   MenuCancel.Enabled = true;
   strOutPath = cOutFolder.Text;
   cTextOut.Clear();
 
   String strBatch = "wget.exe ";
   if(cCheckEnableProxy.Checked)
   {
    strBatch += " --proxy-user="
     + cTextUser.Text
     + " --proxy-pass=" + cTextPass.Text
     + " -e http_proxy=" + cTextServer.Text
     + ":"+ cTextPort.Text;
   }
 
    
   if(cCheckRecursive.Checked)
   {
    strBatch += " -r ";
 
    if(cCheckChildOnly.Checked)
    {
     strBatch += " -np ";
     if(cCheckSiblings.Checked)
     {
      strBatch+= " -l 1 "; // one level
     }
     else
     {
      strBatch+= " -l 0 "; //infinite levels
     }
 
    }
   }
 
   // time stambing -N
   // -P prefix
   //
   strBatch
    += " -o out.cap -N --passive-ftp -x -N -P"
    + strOutPath
    + " "
    + urlComboAddress.Text ;
    
   strBatch+= " ";
 
   String filename="wget.bat";
   if(File.Exists(filename))
   {
    File.Delete(filename);
   }
 
   StreamWriter file = File.CreateText(filename);
   file.WriteLine(strBatch);
   file.Close();
   
   myProcess = new Process();
   myProcess.StartInfo.FileName = filename;
   myProcess.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
   myProcess.StartInfo.RedirectStandardOutput = false;
   myProcess.StartInfo.UseShellExecute = true;
   myProcess.StartInfo.CreateNoWindow = true;
   
   try
   {
    cButtonLeach.Enabled = false;
    cTimerUpdate.Enabled = true;
    myProcess.Start();
   }
   catch(Exception eProc)
   {
    MessageBox.Show(eProc.Message);
   }
  }