Click here to Skip to main content
Click here to Skip to main content

WebResourceProvider

, 23 Mar 2007
Rate this:
Please Sign up or sign in to vote.
A framework to allow public web services to be used as objects in your application.

Introduction

This article describes WebResourceProvider, a simple yet powerful framework for retrieving useful information from publicly available web services. I use the term "web service" in a generic, non-Microsoft sense, to mean information providers such as:

The demo application included with this article shows how you can easily create objects to get:

  • stock quotes
  • the weather for a US zip code
  • the list of locations served by a US zip code
  • the translation of a piece of text
  • the list of broken links on an HTML page
  • the list of top posters at CodeProject

CodeProjectTopPosters object in action

A Word of Caution

Before you use WebResourceProvider to write the next killer app, be aware that there are legal and ethical issues regarding the use of information obtained from other sources. In particular, the terms of service (TOS) of content providers such as Yahoo, CNN, etc. clearly state what you can and cannot do with information retrieved from their sites. Even if you write a web resource provider for personal use only, you should take into consideration any undue stress that your object may put on a web server. The CodeProjectTopPosters example in the demo won't let you get at more than the top 40 CodeProject posters. Further, it pauses between multiple accesses to the CodeProject server in order to not overload it.

How it Works

WebResourceProvider control flowWebResourceProvider works by initializing itself, constructing a URL to be retrieved, downloading the resource, and extracting useful information from the downloaded content. The process is repeated until no more data needs to be downloaded.

You use WebResourceProvider by deriving your own resource provider class from it, and overriding any of these virtual methods (shown in red in the diagram on the right):

  • init
  • constructUrl
  • isPost()
  • getPostData()
  • parseContent()
  • moreAvailable()

WebResourceProvider provides an assortment of methods to help parse downloaded content. They are:

Method Purpose
at Checks whether current location is at a string
atExact Case sensitive version of at()
skipTo Advances current location to next occurence of a string
skipToExact Case sensitive version of skipTo()
skipBackTo Retreats current location to previous occurence of a string
skipBackToExact Case sensitive version of skipBackTo()
extractTo Extracts text from current location to the start of a string
extractToExact Case sensitive version of extractTo()
extractToEnd Extracts text from current location to end of content
getIndex Returns current location
getLinks Returns HREF and IMG links in content
resetIndex Sets current location to start of content
replaceEvery Replaces every occurence of a string in content with another
removeComments Removes comments from content
removeScripts Removes scripts from content
removeEnclosingAnchorTag Removes anchor tag enclosing a string
removeEnclosingQuotes Removes quotes enclosing a string
removeHtml Removes HTML from a string
trim Removes leading and trailing whitespace from a string

Sample Resource Providers

Here are screenshots of the sample resource providers in action.

QuoteProvider object in action The QuoteProvider object works by posting a request to Yahoo's basic stock quote form and parsing the returned information.

ZipCodeDecoder object in action The ZipCodeDecoder object works by posting a request to the USPS zip code association form and parsing the returned information.

WeatherProvider object in action The WeatherProvider object works by posting a request to CNN's weather form and parsing the returned information.

Translator object in action The Translator object works by posting a request to Google's translation engine and parsing the returned information. The request includes the translation mode.

The sample performs a reverse translation and presents it along with the original text for comparison purposes.

LinkChecker object in action The LinkChecker object is a thin layer above the WebResourceProvider class. It delegates the job of determining the document and image links at a URL to the base class.

LinkChecker::getLinks() does its best to determine the links on a page, but because of the large number of ways a link can be specified, this method may miss a few links.

The LinkChecker demo uses WebResourceProvider::urlExists() to check whether a link is valid.

Click here to see a screenshot of the LinkChecker demo run against the CodeProject home page. Keep up the good work, Chris!

Using WebResourceProvider

To use WebResourceProvider do the following:
  1. Build the WebResourceProvider_Lib project.
  2. Modify your application's project to look for header files and libraries in the WebResourceProvider_Lib project area.
  3. Derive an object from WebResourceProvider. You'll need to #include WebResourceProvider.h in your derived class' header file.
  4. Override the constructUrl() method in your derived class. This method specifies the URL to be downloaded.
  5. Override the parse() method in your derived class. This method extracts information from the downloaded content and stores it in the derived class' member variables.
  6. Optionally override other WebResourceProvider virtual methods. See the source code of the sample resource providers included in the demo project for examples.
  7. Link your application with WebResourceProvider.lib.

Acknowledgement

WebResourceProvider uses the following code written by others:

A Call for Interesting WebResourceProviders!

This is an invitation to the CP community to come up with interesting and useful web resource providers. Let your imagination (and coding prowess) flow! Please post your cool WebResourceProvider derived classes at CodeProject.

Revision History

  • 23 Mar 2007
    Updated parsing logic in WeatherProvider, ZipCodeProvider and Translator modules.
  • 7 Oct 2006
    Bug Fix: Updated parsing logic in Translator module.
  • 7 Aug 2002
    Bug Fix: Fixed computation error in extractToEnd().
    Bug Fix: Added missing call to init() in fetchResource().
    Added methods skipBackTo() and skipBackToExact.
  • 8 May 2002
    Added methods urlExists(), getLinks(), removeComments(), removeScripts(), findNoCase() and findStringInArray().
    Modified at(), skipTo(), and extractTo() to be case insensitive. Added case sensitive analogs atExact(), skipToExact() and extractToExact().
    Added LinkChecker sample to demo app.
    Fixed a bug in parse() that caused the fetch status to be ignored.
    Speeded up ZipCodeDecoder sample object.
  • 30 April 2002
    Corrected control flow image.
  • 29 April 2002
    Initial version of article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ravi Bhavnani
Technical Lead
Canada Canada
Ravi Bhavnani is an ardent fan of Microsoft technologies who loves building Windows apps, especially PIMs, system utilities, and things that go bump on the Internet. During his career, Ravi has developed expert systems, desktop imaging apps, marketing automation software, EDA tools, a platform to help people find, analyze and understand information, trading software for institutional investors and advanced data visualization solutions. He currently works for a company that provides enterprise workforce management solutions to large clients.
 
His interests include the .NET framework, reasoning systems, financial analysis and algorithmic trading, NLP, HCI and UI design. Ravi holds a BS in Physics and Math and an MS in Computer Science and was a Microsoft MVP (C++ and C# in 2006 and 2007). He is also the co-inventor of 2 patents on software security and generating data visualization dashboards. His claim to fame is that he crafted CodeProject's "joke" forum post icon.
 
Ravi's biggest fear is that one day he might actually get a life, although the chances of that happening seem extremely remote.
Follow on   Google+   LinkedIn

Comments and Discussions

 
GeneralProxy C# .net PinmemberDoomt23-Jan-11 23:03 
AnswerRe: Proxy C# .net PinmemberRavi Bhavnani24-Jan-11 1:46 
GeneralRe: Proxy C# .net PinmemberDoomt24-Jan-11 6:00 
QuestionJapanese support PinmemberDavidA20056-Mar-09 0:33 
AnswerRe: Japanese support PinmemberDavidA20056-Mar-09 6:20 
GeneralRe: Japanese support PinmemberRavi Bhavnani6-Mar-09 8:04 
GeneralSuggested bug fix PinmemberDavidA20055-Mar-09 3:40 
I found that the 'Translator' function failed - it gave message: "Unable to translate text."
 
Suggested fix:
 
In Translator.cpp, change line 89
 
from:
 
if (!skipTo ("<div id=result_box dir=ltr>"))
 
to:
 
if (!skipTo ("<div id=result_box dir=\"ltr\">"))
 
(the latter string is used in Google Translator project).
GeneralRe: Suggested bug fix PinmemberRavi Bhavnani5-Mar-09 4:08 
QuestionProxy server? PinmemberDavidA20054-Mar-09 4:18 
AnswerRe: Proxy server? PinmemberRavi Bhavnani4-Mar-09 5:52 
GeneralRe: Proxy server? PinmemberDavidA20054-Mar-09 6:03 
GeneralRe: Proxy server? PinmemberDavidA20054-Mar-09 22:27 
GeneralRe: Proxy server? PinmemberDavidA20055-Mar-09 2:49 
GeneralRe: Proxy server? PinmemberRavi Bhavnani5-Mar-09 4:07 
QuestionUnicode? PinmvpHans Dietrich29-Apr-08 23:12 
AnswerRe: Unicode? PinmemberRavi Bhavnani30-Apr-08 2:17 
GeneralRe: Unicode? PinmvpHans Dietrich30-Apr-08 2:40 
GeneralRe: Unicode? PinmemberRavi Bhavnani30-Apr-08 2:42 
GeneralRe: Unicode? PinmemberNeWi1-Sep-09 7:09 
GeneralRe: Unicode? PinmemberRavi Bhavnani1-Sep-09 7:56 
GeneralRe: Unicode? PinmemberNeWi1-Sep-09 12:18 
GeneralRe: Unicode? PinmemberRavi Bhavnani1-Sep-09 12:24 
GeneralAll button does not function properly PinmemberAnshar25-Apr-08 3:42 
GeneralRe: All button does not function properly PinmemberAnshar25-Apr-08 4:13 
GeneralRe: All button does not function properly PinmemberRavi Bhavnani25-Apr-08 4:35 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140721.1 | Last Updated 23 Mar 2007
Article Copyright 2002 by Ravi Bhavnani
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid