Click here to Skip to main content
Click here to Skip to main content
Go to top

Aggregating Web Content in C# (Stock Ratings Sample)

, 21 Mar 2007
Rate this:
Please Sign up or sign in to vote.
A demonstration of aggregation and pattern matching in web content.

Screenshot - stockScout.jpg

Introduction

As technologies continue to evolve and new content distribution methods are developed, there still remains a need to be able to handle, parse, manipulate, and aggregate data in ways, perhaps the original content distributor, never anticipated. With RSS, XML, web browsers, etc., consuming data has never been easier; however, refining the data delivered by these means to include only the content that one desires can pose quite a challenge at times.

For example, suppose we want to aggregate the local weather for 10 business locations from a website you visit regularly. In order to accomplish such a task, we would have to open a web browser, connect to the website, and run 10 queries through the site's frontend in order to obtain the desired data. However, using the various .NET classes and and Regular Expressions, such a task could be completed with very little code and at a significantly faster pace.

Alternatively, we could also apply these techniques in order to aggregate data from one source such as a website and format it to be consumed on multiple devices such as handhelds or media center applications. There have been some fascinating implementations of this concept in various platforms such as the XBOX Media Center and MythTV which have applied these principles to aggregating music videos, movie show times, RSS feeds, and web radio streams.

Legality

Of course, there are legal and moral questions that arise from consuming content which you may not own and altering it to suit your needs. This article assumes that you own the content you want to aggregate, or you have permission from the content owner to consume the data outside of the manner in which it is intended.

The Sample Application - Stock Ratings Aggregator

The Problem

John Q. Public has an aggressive investment strategy where he follows the same process every two weeks after receiving his paycheck from the ACME corporation:

  1. He logs into MSN's Money website and checks his portfolio.
  2. He already has a pool of 15 companies that he invests in regularly, so he glances over their performance.
  3. His broker only allows him to trade once a week, so he changes his stock every week (from his list of 15).
  4. Since he wants to make a wise decision in selecting which stock to invest in this week, he logs into MSN's MoneyCentral website, and goes through his list one at a time, entering in each stock symbol in order to see the much respected Stock Scouter rating.
  5. After being burned on SIRI (Sirius Satellite Radio), John decides to not invest in any stock with a rating below 8 (scale of 10).

John longs for a day when MSN will offer him some kind of RSS feed that will aggregate his stock ratings for him, but since that's not available, he accepts the fact that he will waste up to an hour every 2 weeks (on ACME time) doing this manually. If only John Q. Public was a .NET programmer, he'd know how easy tasks such as this are to automate!

The Solution

John is a beginner, so we want to keep this easy for him, so we want to ensure that we don't hard code anything in the source. (For the example, we are going to break the OOP commandment to always program to an interface and not an implementation, but John doesn't know any better!) To hold the variables, we'll use an XML file. The XML file, however, has way too many tags in it to display here, so you can see it in the project files. The XML files store all of John's stock symbols, the Regular Expression he's going to use to parse the web stream, and the URL of the site he wants to parse.

We only need three classes and the main entry point (very simple). The classes are:

  1. Stock - Parses the HTML to find the stock rating.
  2. HTTPParser - Fetches the stream (the raw HTML).
  3. Settings - Encapsulates the values in the XML settings file.

The Application Entry Point

static void Main(string[] args)
{
    Program stockScouter = new Program();
    stockScouter.init();
    Console.ReadLine();
}
private void init()
{
    //Load the Application settings from the xml file
    Settings xmlConfiguration = new Settings();
    //Enumerate the stock symbols
    foreach (string stockSymbol in xmlConfiguration.GetStocks)
    {
        //Parse the searchUrl and display the rating
        string stockSearchUrl = xmlConfiguration.BaseUrl + stockSymbol;
        Stock getStock = new Stock(stockSearchUrl, 
                         xmlConfiguration.Pattern);
        Console.WriteLine(stockSymbol + " " + getStock.GetRating);
    }
}

The HTTP Parser (StreamReader)

There are a few properties in this class, but I'll cover the core method. This class needs the System.Net and System.IO namespaces referenced. It's quite simple, we're just asking for a search URL in the instantiation and setting that to a local field/public property. The Parse method creates a new web request and saves the entire page's HTML code into a local field/public property.

public HTTPParser(string url)
{
    this.url = url;
}

public void Parse()
{
    string cachedStream = string.Empty;
    HttpWebRequest myWebRequest = (HttpWebRequest)WebRequest.Create(Url);
    HttpWebResponse siteResponse = (HttpWebResponse)myWebRequest.GetResponse();
    Stream streamResponse = siteResponse.GetResponseStream();
    StreamReader reader = new StreamReader(streamResponse, Encoding.UTF8);
    cachedStream = reader.ReadToEnd();
    reader.Close();
    siteResponse.Close();
    streamResponse.Close();

    Html = cachedStream;
}

The Stock Class

Ideally, this class would hold the stock's name, symbol, price, rating, etc., but we're keeping this simple, so this class exists just for a physical representation of the stock. Its purpose is to extract the stock's rating from the string (which is stored in the HTTP parser).

class Stock
{
    HTTPParser msnStockStream;
    private string cachedMsnStream = string.Empty;
    private string rating = string.Empty;
    private string RegExPattern = string.Empty;
    
    public string GetRating
    {
        get{ return rating; }
    }

    public Stock(string url, string RegExPattern)
    {
        this.RegExPattern = RegExPattern;
        //Create the instance of the Parser
        msnStockStream = new HTTPParser(url);
        //kick off the HTTPParser to get the page
        msnStockStream.Parse();
        //Save the HTML in a local variable
        cachedMsnStream = msnStockStream.GetHtml;

        getRating();
    }

    private void getRating()
    {
        //Match our Pattern
        rating = Regex.Match(cachedMsnStream, 
                             RegExPattern).Value;
        rating = cleanUnwantedChars(rating);
    }
    private string cleanUnwantedChars(string matchedString)
    {
        //custom function to remove the extra quotes
        return matchedString.Replace("\"", "");
    }
}

The Settings Class

The Settings class encapsulates our settings.xml file, so we only need this one instance of System.Xml running to have access to all of our settings. This class only has one method, and the rest of the code (in the posted source) exists only to provide friendly getter methods.

private void Read()
{
    symbols = new ArrayList();
    string xmlFilePath = System.IO.Directory.GetCurrentDirectory() + 
                         "\\settings.xml";
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(xmlFilePath);
    XmlNode stockSettings = xmlDoc.SelectSingleNode("//Stocks");

    for (int i = 0; i < stockSettings.ChildNodes.Count; i++)
      symbols.Add(stockSettings.ChildNodes[i].InnerText);
    
    XmlNode RegExSettings = xmlDoc.SelectSingleNode("//RegEx");
    pattern = RegExSettings.ChildNodes[0].InnerText;
    baseUri = RegExSettings.ChildNodes[1].InnerText;
}

Conclusion

In five simple minutes or less, John Q. Public was able to write a few simple C# classes and automate a process that has eaten away many hours of his life! Because of this improvement in efficiency, he now has more time to ignore his ringing telephone at work and pretend to be busy!

The Net.HttpWebRequest, Net.HttpResponse, IO.Stream, IO.StreamReader, and RegularExpressions.Regex classes seem like a match made in heaven when used in conjunction with one another.

Code responsibly!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

thund3rstruck
Software Developer
United States United States
I'm a typical 30 year old generation X guy that likes video games, NFL football, and comic style art. I have an insatiable passion for programming and doing what ever it takes to become a better programmer.

Comments and Discussions

 
GeneralMy Thoughts On This Article PinmemberCodeMasterMP22-Aug-07 6:29 
GeneralInteresting! PinmemberRavi Bhavnani21-Mar-07 13:36 
GeneralRe: Interesting! Pinmemberthund3rstruck22-Mar-07 1:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140926.1 | Last Updated 21 Mar 2007
Article Copyright 2007 by thund3rstruck
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid