WebResourceProvider






4.90/5 (21 votes)
A framework to allow public web services to be used as objects in your application.
Introduction
This article describesWebResourceProvider
, a simple yet powerful framework for retrieving useful information from publicly available web services. I use the term "web service" in a generic, non-Microsoft sense, to mean information providers such as:
- search engines like Google and Altavista
- online stock quote providers such as Yahoo, CNN and Quicken
- weather information providers such as Weather.com, AccuWeather and CNN
- online translation services like Google, Babelfish and FreeTranslation
- etc.
The demo application included with this article shows how you can easily create objects to get:
- stock quotes
- the weather for a US zip code
- the list of locations served by a US zip code
- the translation of a piece of text
- the list of broken links on an HTML page
- the list of top posters at CodeProject
A Word of Caution
Before you useWebResourceProvider
to write the next killer app, be aware that there are legal and ethical issues regarding the use of information obtained from other sources. In particular, the terms of service (TOS) of content providers such as Yahoo, CNN, etc. clearly state what you can and cannot do with information retrieved from their sites. Even if you write a web resource provider for personal use only, you should take into consideration any undue stress that your object may put on a web server. The CodeProjectTopPosters
example in the demo won't let you get at more than the top 40 CodeProject posters. Further, it pauses between multiple accesses to the CodeProject server in order to not overload it.
How it Works
WebResourceProvider
works by initializing itself, constructing a URL to be retrieved, downloading the resource, and extracting useful information from the downloaded content. The process is repeated until no more data needs to be downloaded.
You use WebResourceProvider
by deriving your own resource provider class from it, and overriding any of these virtual methods (shown in red in the diagram on the right):
init
constructUrl
isPost()
getPostData()
parseContent()
moreAvailable()
WebResourceProvider
provides an assortment of methods to help parse downloaded content. They are:
Method | Purpose | |
at |
Checks whether current location is at a string | |
atExact |
Case sensitive version of at() |
|
skipTo |
Advances current location to next occurence of a string | |
skipToExact |
Case sensitive version of skipTo() |
|
skipBackTo |
Retreats current location to previous occurence of a string | |
skipBackToExact |
Case sensitive version of skipBackTo() |
|
extractTo |
Extracts text from current location to the start of a string | |
extractToExact |
Case sensitive version of extractTo() |
|
extractToEnd |
Extracts text from current location to end of content | |
getIndex |
Returns current location | |
getLinks |
Returns HREF and IMG links in content | |
resetIndex |
Sets current location to start of content | |
replaceEvery |
Replaces every occurence of a string in content with another | |
removeComments |
Removes comments from content | |
removeScripts |
Removes scripts from content | |
removeEnclosingAnchorTag |
Removes anchor tag enclosing a string | |
removeEnclosingQuotes |
Removes quotes enclosing a string | |
removeHtml |
Removes HTML from a string | |
trim |
Removes leading and trailing whitespace from a string |
Sample Resource Providers
Here are screenshots of the sample resource providers in action.
![]() |
The QuoteProvider object works by posting a request to Yahoo's basic stock quote form and parsing the returned information. |
![]() |
The ZipCodeDecoder object works by posting a request to the USPS zip code association form and parsing the returned information. |
![]() |
The WeatherProvider object works by posting a request to CNN's weather form and parsing the returned information. |
![]() |
The Translator object works by posting a request to Google's translation engine and parsing the returned information. The request includes the translation mode.
The sample performs a reverse translation and presents it along with the original text for comparison purposes. |
![]() |
The LinkChecker object is a thin layer above the WebResourceProvider class. It delegates the job of determining the document and image links at a URL to the base class.
The Click here to see a screenshot of the |
Using WebResourceProvider
To useWebResourceProvider
do the following:
- Build the WebResourceProvider_Lib project.
- Modify your application's project to look for header files and libraries in the WebResourceProvider_Lib project area.
- Derive an object from
WebResourceProvider
. You'll need to#include WebResourceProvider.h
in your derived class' header file. - Override the
constructUrl()
method in your derived class. This method specifies the URL to be downloaded. - Override the
parse()
method in your derived class. This method extracts information from the downloaded content and stores it in the derived class' member variables. - Optionally override other
WebResourceProvider
virtual methods. See the source code of the sample resource providers included in the demo project for examples. - Link your application with WebResourceProvider.lib.
Acknowledgement
WebResourceProvider
uses the following code written by others:
- WebGrab by Chris Maunder
- AmHttpUtilities by Anders Molin
- Case-Insensitive String Replace by Uwe Keim
A Call for Interesting WebResourceProviders!
This is an invitation to the CP community to come up with interesting and useful web resource providers. Let your imagination (and coding prowess) flow! Please post your cool WebResourceProvider
derived classes at CodeProject.
Revision History
- 23 Mar 2007
Updated parsing logic in WeatherProvider, ZipCodeProvider and Translator modules. - 7 Oct 2006
Bug Fix: Updated parsing logic in Translator module. - 7 Aug 2002
Bug Fix: Fixed computation error inextractToEnd()
.
Bug Fix: Added missing call toinit()
infetchResource()
.
Added methodsskipBackTo()
andskipBackToExact
. - 8 May 2002
Added methodsurlExists()
,getLinks()
,removeComments()
,removeScripts()
,findNoCase()
andfindStringInArray()
.
Modifiedat()
,skipTo()
, andextractTo()
to be case insensitive. Added case sensitive analogsatExact()
,skipToExact()
andextractToExact()
.
AddedLinkChecker
sample to demo app.
Fixed a bug inparse()
that caused the fetch status to be ignored.
Speeded upZipCodeDecoder
sample object. - 30 April 2002
Corrected control flow image. - 29 April 2002
Initial version of article.