Click here to Skip to main content
15,880,796 members
Articles / Programming Languages / C#

Ripping Data on the Web

Rate me:
Please Sign up or sign in to vote.
4.90/5 (23 votes)
5 Mar 200346 min read 111.6K   1.2K   150  
How to recover and repackage information on the World Wide Web.
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body bgcolor="#FFFFFF" text="#000000">
<h1>WebScraper User Manual</h1>
<p><img src="ScreenShot.gif" width="306" height="338"></p>
<p>This program will download financial data from Yahoo and place it in an Access 
  database. You may then use this database as a source of data for evaluating 
  and tracking your investments, and screening for new investment candidates. 
  You can use standard Microsoft Office tools to move data between the database, 
  Excel spreadsheets, Word documents, and even PowerPoint presentation should 
  you need to.</p>
<p>The program is designed to run in the background, which allows you to use your 
  computer normally while the scraping process is continuing, and to easily pause 
  the process to do other work on your machine. Even on broadband connections, 
  collection of a full database can take several hours, and this ensures that 
  your machine is usable through the process. </p>
<p>To use the program, you must be online, so that the program can get the web 
  pages. You simply check off which elements of data you wish to collect on the 
  top of the form and then press the <b>Start</b> button. While the system is 
  running, you must leave your mouse inside the form for the collection to continue. 
  If you do need to do something else, the program will pause automatically, and 
  will continue running once you return the mouse to the inside of the form.</p>
<p>When the program is running, it will update the information shown in the lower 
  half of the display to give you an idea of what's going on and how long it will 
  be until it finishes processing the information you have selected. </p>
<p>The first row of data shows you how many records have been processed and how 
  many remain. </p>
<p>The second row shows you the number of records that were <i>successfully</i> 
  recovered, and what percent of the total number of records that was. </p>
<p>The third row shows you how long WebScraper has been running and an estimate 
  of how long it will take to finish scraping the current data.</p>
<p>The fourth row tells you how long it took to get the last record.</p>
<p>The fifth row shows you the ticker symbol of the current record and a progress 
  bar that shows how far along the system is in scraping the current data. The 
  background of the ticker symbol will be red if the scraper could not get any 
  data, and will be green if it succeeded in getting data.</p>
<p>There is a text display immediately above the buttons that shows a scrolling 
  label identifying which type of data is currently being scraped.</p>
<p>The buttons at the bottom allow you to start, pause, and stop the scraping 
  operations and exit the program. The start button will automatically change 
  it's label between &quot;Start&quot; and &quot;Pause&quot; depending on whether 
  or not the scraper is running. Remember, you must leave your mouse inside the 
  form for the scraper to run automatically, or, if you need to do something else, 
  go ahead, the scraper will automatically stop and wait for you to come back.</p>
<p>Once the scraper has completed operation, your CorporateData.mdb database will 
  contain all of the information scraped off the site. Do with it what you will, 
  but remember,.the data is for your personal use only.</p>
<p>&nbsp;</p>
</body>
</html>

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions