Click here to Skip to main content
15,860,943 members
Articles / Desktop Programming / MFC
Article

WebResourceProvider

Rate me:
Please Sign up or sign in to vote.
4.90/5 (22 votes)
23 Mar 2007CPOL5 min read 341.3K   2K   144   157
A framework to allow public web services to be used as objects in your application.

Introduction

This article describes WebResourceProvider, a simple yet powerful framework for retrieving useful information from publicly available web services. I use the term "web service" in a generic, non-Microsoft sense, to mean information providers such as:

The demo application included with this article shows how you can easily create objects to get:

  • stock quotes
  • the weather for a US zip code
  • the list of locations served by a US zip code
  • the translation of a piece of text
  • the list of broken links on an HTML page
  • the list of top posters at CodeProject

CodeProjectTopPosters object in action

A Word of Caution

Before you use WebResourceProvider to write the next killer app, be aware that there are legal and ethical issues regarding the use of information obtained from other sources. In particular, the terms of service (TOS) of content providers such as Yahoo, CNN, etc. clearly state what you can and cannot do with information retrieved from their sites. Even if you write a web resource provider for personal use only, you should take into consideration any undue stress that your object may put on a web server. The CodeProjectTopPosters example in the demo won't let you get at more than the top 40 CodeProject posters. Further, it pauses between multiple accesses to the CodeProject server in order to not overload it.

How it Works

WebResourceProvider control flowWebResourceProvider works by initializing itself, constructing a URL to be retrieved, downloading the resource, and extracting useful information from the downloaded content. The process is repeated until no more data needs to be downloaded.

You use WebResourceProvider by deriving your own resource provider class from it, and overriding any of these virtual methods (shown in red in the diagram on the right):

  • init
  • constructUrl
  • isPost()
  • getPostData()
  • parseContent()
  • moreAvailable()

WebResourceProvider provides an assortment of methods to help parse downloaded content. They are:

Method Purpose
atChecks whether current location is at a string
atExactCase sensitive version of at()
skipToAdvances current location to next occurence of a string
skipToExactCase sensitive version of skipTo()
skipBackToRetreats current location to previous occurence of a string
skipBackToExactCase sensitive version of skipBackTo()
extractToExtracts text from current location to the start of a string
extractToExactCase sensitive version of extractTo()
extractToEndExtracts text from current location to end of content
getIndexReturns current location
getLinksReturns HREF and IMG links in content
resetIndexSets current location to start of content
replaceEveryReplaces every occurence of a string in content with another
removeCommentsRemoves comments from content
removeScriptsRemoves scripts from content
removeEnclosingAnchorTagRemoves anchor tag enclosing a string
removeEnclosingQuotesRemoves quotes enclosing a string
removeHtmlRemoves HTML from a string
trimRemoves leading and trailing whitespace from a string

Sample Resource Providers

Here are screenshots of the sample resource providers in action.

QuoteProvider object in action The QuoteProvider object works by posting a request to Yahoo's basic stock quote form and parsing the returned information.

ZipCodeDecoder object in action The ZipCodeDecoder object works by posting a request to the USPS zip code association form and parsing the returned information.

WeatherProvider object in action The WeatherProvider object works by posting a request to CNN's weather form and parsing the returned information.

Translator object in action The Translator object works by posting a request to Google's translation engine and parsing the returned information. The request includes the translation mode.

The sample performs a reverse translation and presents it along with the original text for comparison purposes.

LinkChecker object in action The LinkChecker object is a thin layer above the WebResourceProvider class. It delegates the job of determining the document and image links at a URL to the base class.

LinkChecker::getLinks() does its best to determine the links on a page, but because of the large number of ways a link can be specified, this method may miss a few links.

The LinkChecker demo uses WebResourceProvider::urlExists() to check whether a link is valid.

Click here to see a screenshot of the LinkChecker demo run against the CodeProject home page. Keep up the good work, Chris!

Using WebResourceProvider

To use WebResourceProvider do the following:
  1. Build the WebResourceProvider_Lib project.
  2. Modify your application's project to look for header files and libraries in the WebResourceProvider_Lib project area.
  3. Derive an object from WebResourceProvider. You'll need to #include WebResourceProvider.h in your derived class' header file.
  4. Override the constructUrl() method in your derived class. This method specifies the URL to be downloaded.
  5. Override the parse() method in your derived class. This method extracts information from the downloaded content and stores it in the derived class' member variables.
  6. Optionally override other WebResourceProvider virtual methods. See the source code of the sample resource providers included in the demo project for examples.
  7. Link your application with WebResourceProvider.lib.

Acknowledgement

WebResourceProvider uses the following code written by others:

A Call for Interesting WebResourceProviders!

This is an invitation to the CP community to come up with interesting and useful web resource providers. Let your imagination (and coding prowess) flow! Please post your cool WebResourceProvider derived classes at CodeProject.

Revision History

  • 23 Mar 2007
    Updated parsing logic in WeatherProvider, ZipCodeProvider and Translator modules.
  • 7 Oct 2006
    Bug Fix: Updated parsing logic in Translator module.
  • 7 Aug 2002
    Bug Fix: Fixed computation error in extractToEnd().
    Bug Fix: Added missing call to init() in fetchResource().
    Added methods skipBackTo() and skipBackToExact.
  • 8 May 2002
    Added methods urlExists(), getLinks(), removeComments(), removeScripts(), findNoCase() and findStringInArray().
    Modified at(), skipTo(), and extractTo() to be case insensitive. Added case sensitive analogs atExact(), skipToExact() and extractToExact().
    Added LinkChecker sample to demo app.
    Fixed a bug in parse() that caused the fetch status to be ignored.
    Speeded up ZipCodeDecoder sample object.
  • 30 April 2002
    Corrected control flow image.
  • 29 April 2002
    Initial version of article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Lead
Canada Canada
Ravi Bhavnani is an ardent fan of Microsoft technologies who loves building Windows apps, especially PIMs, system utilities, and things that go bump on the Internet. During his career, Ravi has developed expert systems, desktop imaging apps, marketing automation software, EDA tools, a platform to help people find, analyze and understand information, trading software for institutional investors and advanced data visualization solutions. He currently works for a company that provides enterprise workforce management solutions to large clients.

His interests include the .NET framework, reasoning systems, financial analysis and algorithmic trading, NLP, HCI and UI design. Ravi holds a BS in Physics and Math and an MS in Computer Science and was a Microsoft MVP (C++ and C# in 2006 and 2007). He is also the co-inventor of 3 patents on software security and generating data visualization dashboards. His claim to fame is that he crafted CodeProject's "joke" forum post icon.

Ravi's biggest fear is that one day he might actually get a life, although the chances of that happening seem extremely remote.

Comments and Discussions

 
GeneralProxy C# .net Pin
Doomt23-Jan-11 23:03
Doomt23-Jan-11 23:03 
AnswerRe: Proxy C# .net Pin
Ravi Bhavnani24-Jan-11 1:46
professionalRavi Bhavnani24-Jan-11 1:46 
GeneralRe: Proxy C# .net Pin
Doomt24-Jan-11 6:00
Doomt24-Jan-11 6:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.