Click here to Skip to main content
Click here to Skip to main content
Technical Blog

Tagged as

Scraping dynamic data

, 15 Jan 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
I have three solutions for periodically scraping a website.

Usually my clients request for a website to be scraped into a standard format like CSV, which they can then integrate with their existing applications. However sometimes a client need a website scraped periodically because its data is continually updated. An example of the first use case is census statistics, and of the second stock prices.

I have three solutions for periodically scraping a website:

  1. I provide the client with my web scraping code, which they can then execute regularly
  2. Client pays me a small fee in future whenever they want the data rescraped
  3. I build a web application that scrapes regularly and provides the data in a useful form

The first option is not always practical if the client does not have a technical background. Additionally my solutions are developed and tested on Linux and may not work on Windows.

The second option is generally not attractive to the client because it puts them in a weak position where they are dependent on me being contactable and cooperative in future.
Also it involves ongoing costs for them.

So usually I end up building a basic web application that consists of a CRON job to do the scraping, an interface to the scraped data, and some administration settings. If the scraping jobs are not too big I am happy to host the application on my own server, however most clients prefer the security of hosting it on their own server in case the app breaks down.

Unfortunately I find hosting on their server does not work well because the client will often have different versions of libraries or use a platform I am not familiar with. Additionally I prefer to build my web applications in Python (using web2py), and though Python is great for development it cannot compare to PHP for ease of deployment. I can usually figure this all out but it takes time and also trust from the client to give me root privilege on their server. And given that these web applications are generally low cost (~ $1000) the ease of deployment is important.

All this is far from ideal. The solution? - see the next post.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Richard Penman

Australia Australia
No Biography provided

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.1411023.1 | Last Updated 15 Jan 2013
Article Copyright 2013 by Richard Penman
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid