Click here to Skip to main content
Click here to Skip to main content

WebResourceProvider

By , 23 Mar 2007
 

Introduction

This article describes WebResourceProvider, a simple yet powerful framework for retrieving useful information from publicly available web services. I use the term "web service" in a generic, non-Microsoft sense, to mean information providers such as:

The demo application included with this article shows how you can easily create objects to get:

  • stock quotes
  • the weather for a US zip code
  • the list of locations served by a US zip code
  • the translation of a piece of text
  • the list of broken links on an HTML page
  • the list of top posters at CodeProject

CodeProjectTopPosters object in action

A Word of Caution

Before you use WebResourceProvider to write the next killer app, be aware that there are legal and ethical issues regarding the use of information obtained from other sources. In particular, the terms of service (TOS) of content providers such as Yahoo, CNN, etc. clearly state what you can and cannot do with information retrieved from their sites. Even if you write a web resource provider for personal use only, you should take into consideration any undue stress that your object may put on a web server. The CodeProjectTopPosters example in the demo won't let you get at more than the top 40 CodeProject posters. Further, it pauses between multiple accesses to the CodeProject server in order to not overload it.

How it Works

WebResourceProvider control flowWebResourceProvider works by initializing itself, constructing a URL to be retrieved, downloading the resource, and extracting useful information from the downloaded content. The process is repeated until no more data needs to be downloaded.

You use WebResourceProvider by deriving your own resource provider class from it, and overriding any of these virtual methods (shown in red in the diagram on the right):

  • init
  • constructUrl
  • isPost()
  • getPostData()
  • parseContent()
  • moreAvailable()

WebResourceProvider provides an assortment of methods to help parse downloaded content. They are:

Method Purpose
at Checks whether current location is at a string
atExact Case sensitive version of at()
skipTo Advances current location to next occurence of a string
skipToExact Case sensitive version of skipTo()
skipBackTo Retreats current location to previous occurence of a string
skipBackToExact Case sensitive version of skipBackTo()
extractTo Extracts text from current location to the start of a string
extractToExact Case sensitive version of extractTo()
extractToEnd Extracts text from current location to end of content
getIndex Returns current location
getLinks Returns HREF and IMG links in content
resetIndex Sets current location to start of content
replaceEvery Replaces every occurence of a string in content with another
removeComments Removes comments from content
removeScripts Removes scripts from content
removeEnclosingAnchorTag Removes anchor tag enclosing a string
removeEnclosingQuotes Removes quotes enclosing a string
removeHtml Removes HTML from a string
trim Removes leading and trailing whitespace from a string

Sample Resource Providers

Here are screenshots of the sample resource providers in action.

QuoteProvider object in action The QuoteProvider object works by posting a request to Yahoo's basic stock quote form and parsing the returned information.

ZipCodeDecoder object in action The ZipCodeDecoder object works by posting a request to the USPS zip code association form and parsing the returned information.

WeatherProvider object in action The WeatherProvider object works by posting a request to CNN's weather form and parsing the returned information.

Translator object in action The Translator object works by posting a request to Google's translation engine and parsing the returned information. The request includes the translation mode.

The sample performs a reverse translation and presents it along with the original text for comparison purposes.

LinkChecker object in action The LinkChecker object is a thin layer above the WebResourceProvider class. It delegates the job of determining the document and image links at a URL to the base class.

LinkChecker::getLinks() does its best to determine the links on a page, but because of the large number of ways a link can be specified, this method may miss a few links.

The LinkChecker demo uses WebResourceProvider::urlExists() to check whether a link is valid.

Click here to see a screenshot of the LinkChecker demo run against the CodeProject home page. Keep up the good work, Chris!

Using WebResourceProvider

To use WebResourceProvider do the following:
  1. Build the WebResourceProvider_Lib project.
  2. Modify your application's project to look for header files and libraries in the WebResourceProvider_Lib project area.
  3. Derive an object from WebResourceProvider. You'll need to #include WebResourceProvider.h in your derived class' header file.
  4. Override the constructUrl() method in your derived class. This method specifies the URL to be downloaded.
  5. Override the parse() method in your derived class. This method extracts information from the downloaded content and stores it in the derived class' member variables.
  6. Optionally override other WebResourceProvider virtual methods. See the source code of the sample resource providers included in the demo project for examples.
  7. Link your application with WebResourceProvider.lib.

Acknowledgement

WebResourceProvider uses the following code written by others:

A Call for Interesting WebResourceProviders!

This is an invitation to the CP community to come up with interesting and useful web resource providers. Let your imagination (and coding prowess) flow! Please post your cool WebResourceProvider derived classes at CodeProject.

Revision History

  • 23 Mar 2007
    Updated parsing logic in WeatherProvider, ZipCodeProvider and Translator modules.
  • 7 Oct 2006
    Bug Fix: Updated parsing logic in Translator module.
  • 7 Aug 2002
    Bug Fix: Fixed computation error in extractToEnd().
    Bug Fix: Added missing call to init() in fetchResource().
    Added methods skipBackTo() and skipBackToExact.
  • 8 May 2002
    Added methods urlExists(), getLinks(), removeComments(), removeScripts(), findNoCase() and findStringInArray().
    Modified at(), skipTo(), and extractTo() to be case insensitive. Added case sensitive analogs atExact(), skipToExact() and extractToExact().
    Added LinkChecker sample to demo app.
    Fixed a bug in parse() that caused the fetch status to be ignored.
    Speeded up ZipCodeDecoder sample object.
  • 30 April 2002
    Corrected control flow image.
  • 29 April 2002
    Initial version of article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ravi Bhavnani
Technical Lead
Canada Canada
Member
Ravi Bhavnani is an ardent fan of Microsoft technologies who loves building Windows apps, especially PIMs, system utilities, and things that go bump on the Internet. During his career, Ravi has developed expert systems, desktop imaging apps, marketing automation software, EDA tools, a platform to help people find, analyze and understand information, trading software for institutional investors and advanced data visualization solutions. He currently works for a company that provides enterprise workforce management solutions to large clients.
 
His interests include the .NET framework, reasoning systems, financial analysis and algorithmic trading, NLP, CHI and UI design. Ravi holds a BS in Physics and Math and an MS in Computer Science and was a Microsoft MVP (C++ and C# in 2006 and 2007). He is also the co-inventor of 2 patents on software security and generating data visualization dashboards. His claim to fame is that he crafted CodeProject's "joke" forum post icon.
 
Ravi's biggest fear is that one day he might actually get a life, although the chances of that happening seem extremely remote.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionProxy server?memberDavidA20054 Mar '09 - 4:18 
Hi
 
Thanks for providing this code. I want to use the demo code for accessing Google Translator.
 
I'm sitting behind a proxy server. Your GoogleTranslator project works fine (I don't know how it knows about the proxy). However, the WebResourceProvider Demo Translate function just times out.
 
I prefer to use the WebResourceProvider Demo as a basis for my project because I know C++ not C#. I can see that CWebGrab has a proxy setting but I think that CWebGrab is not used by Translate. Please can you suggest how I can modify WebResourceProvider Demo so that it knows about the proxy for the case of Translate?
 
David
AnswerRe: Proxy server? PinmemberRavi Bhavnani4 Mar '09 - 5:52 
You hit the nail on the head! The C++ version is proxy disadvantaged - unfortunately I haven't had the cycles to make it mimic the .NET flavor. Frown | :(
 
My GoogleTranslator article uses the .NET version which works thru a proxy server. As a worst case scenario, your C++ app could slave a .NET app to do the translation, although I agree it's prolly less work to make the C++ version proxy happy.
 
/ravi
 
My new year resolution: 2048 x 1536
Home | Articles | My .NET bits | Freeware
ravib(at)ravib(dot)com

GeneralRe: Proxy server? PinmemberDavidA20054 Mar '09 - 6:03 
Thanks for your reply.
 
> As a worst case scenario, your C++ app could slave a .NET app to do the translation
 
Briefly, how would I do that?
 
Or I guess I could replace GoogleTranslator.cs with a C++ version. But how would I reference the DLL?
 
Sorry, I only know C++ and MFC, not C# and .NET.
 
I would appreciate any guidance you can give as I am very keen to use this code.
 
David
GeneralRe: Proxy server? PinmemberDavidA20054 Mar '09 - 22:27 
Ravi, please will you give me a hint as to how to modify this C++ version to use a proxy? Does it use an underlying MFC class that I can instruct to use a proxy server?
GeneralRe: Proxy server? PinmemberDavidA20055 Mar '09 - 2:49 
I enabled proxy server compatibility by changing line 69 of AmHttpSocket.cpp from:
 
hIO = InternetOpen(m_strAgentName, INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
 
to
 
hIO = InternetOpen(m_strAgentName, INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
 
works for me under Win XP SP2.
 
See: http://msdn.microsoft.com/en-us/library/aa383996(VS.85).aspx
GeneralRe: Proxy server? PinmemberRavi Bhavnani5 Mar '09 - 4:07 
Thanks, David! I'll update the article.
 
/ravi
 
My new year resolution: 2048 x 1536
Home | Articles | My .NET bits | Freeware
ravib(at)ravib(dot)com

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 23 Mar 2007
Article Copyright 2002 by Ravi Bhavnani
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid