5,696,576 members and growing! (19,016 online)
Email Password   helpLost your password?
Desktop Development » Smart Client » General     Intermediate License: The Code Project Open License (CPOL)

RSS Feed Aggregator and Blogging Smart Client

By Omar Al Zabir

RSS Feed aggregator and blogging Smart Client which uses Enterprise Library, Updater Application Block, lots of XML hacks and desktop tricks. A comprehensive guide to real life hurdles of Smart Client development.
C#, VB 6, VB, Windows, .NET, .NET 1.1, ADO.NET, VS.NET2003, Visual Studio, Dev

Posted: 23 Jul 2005
Updated: 16 Aug 2005
Views: 178,480
Bookmarked: 280 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
81 votes for this Article.
Popularity: 8.86 Rating: 4.64 out of 5
6 votes, 7.4%
1
0 votes, 0.0%
2
0 votes, 0.0%
3
4 votes, 4.9%
4
71 votes, 87.7%
5

Contents

Introduction

RSS Feeder.NET is a free open source desktop RSS feed aggregator which downloads feeds from web sources and stores them locally for offline viewing, searching and processing. It is also a rich blogging tool which you can use to blog a variety of blog engines including WordPress, B2Evolution, .Text, Community Server etc. You can be fully MS Outlook® dependent or can run fully standalone. You can also use both at the same time whichever you find comfortable to work with. It does not increase the Outlook load time, nor does it make the Outlook slow or prevent it from closing properly. It is a Smart Client that makes best use of both Local Resource and Distributed Web Information sources.

Update History

  • August 9, 2005 - Source code updated, Web Log Manager has several fixes.
  • August 8, 2005 - When you first load the project and build, you will find an error in WebLogManager.resx. The solution is to simply load the form, do something to make it dirty, and save the form.

Features

  • Newspaper mode. You can read feeds in a more readable newspaper mode called “Blogpaper”.
  • Auto Discovery. Drag any hyper link and I will find out whether there is any RSS Feed in that page.
  • Outlook integration. You can store feeds in Outlook folders.
  • Blogging. It provides you a Outlook 2003 style convenient workspace to manage your blog accounts and write rich posts.
  • Blog from Outlook. You can specify an Outlook folder for a weblog account. All the posts from that folder is automatically posted to the weblog during synchronization. You can write posts as HTML using Word editor. Post content (HTML markup) is cleaned rigorously before posting to the weblog.
  • Outlook View. It uses a customized view to present a more readable list of feeds in Outlook Folders. The standard Post view is not easy to browse through quickly. The view puts the subject first in bold and an excerpt of the post under the subject.
  • Optimized Startup. You can safely put RSS Feeder at startup and it won’t make your Windows® start slower. A clever lazy loading process puts no effort on Windows® during startup instead starts the app when Windows® has finally regained its strength after the long boot up struggle.
  • Newsgator Import. Newsgator users can use RSS Feeder to import all the subscriptions and seamlessly replace Newsgator without modifying the Outlook folder locations.

Feature walkthrough

Let’s have a small walkthrough of all the features of RSS Feeder .NET:

Outlook style view

The three pane view of Outlook is a user interface engineering marvel. It makes browsing through information fast and easy. In RSS Feeder, I have followed the same concept. The left most pane is the channel list. When you click a channel you see the list of feeds in the middle list. You can instantly search through feeds using the search box at the bottom middle. When you click on a feed, the feed content is displayed on the right side viewer.

You can directly edit channel properties from the left bottom property grid. Click on the title “Properties” to show the property grid. .NET’s built in property grid makes it very convenient to expose objects and allow users to modify the objects conveniently. I have implemented several extensions on property grid which I will be explaining later on.

Daily Blogpaper – Newspaper style reading

The most convenient reading view for blogs is the newspaper style view which I call “The Daily Blogpapper”. You can just open the Blogpaper, read through all the latest posts collected from all the channels (optionally excluded) and click on “Mark as Read”. It’s as simple as this.

The previous Outlook style view is useful to work with feeds and channels. But it is not a convenient reading environment. The Blogpaper gives you a truly comfortable reading environment for everyday reading.

Rich blogging feature

RSS Feeder is not just an RSS Aggregator; it’s also an equally powerful feature rich blogging tool.

Outlook 2003 style view is created using the amazing free UI components which you can find at Divelements Ltd. On the left side, you can see the weblogs accounts and your posts in that account. The right side is the editing environment.

RSS Feeder currently supports MetaWebLogAPI supported blog engines like WordPress, B2Evolution, Drupal etc. and also some XML web service based blog engines like .Text and Community Server. We will see how we can implement these in detail later on.

Outlook integration

If you don’t like my app, you can remain a MS Outlook user. When you install the application, it will ask you whether you need Outlook integration. You just need to specify a base folder which will contain all the child channel folders and the RSS Feeder will feed Outlook with all the feeds.

The convenient view of feed list makes it very easy to skim through lots of post rapidly. The auto preview also gives a glimpse of the post content eliminating the need to open each post and we can decide whether to keep or throw it away.

Blogging from Outlook

This is my favorite, you can create posts in an Outlook folder and then create a web log account which maps to that folder. All the posts in that folder are delivered to the web log. You can drag posts from some other place or can write new posts in that folder. Whenever RSS Feeder runs its periodical synchronization (every 5 mins) it will read the posts from the folder and then send the posts to the corresponding blog site.

Informative Progress Meter

The Send/Receive window of RSS Feeder gives you detailed statistics about feed synchronization. You can view the errors received from the server, you can see the speed of your internet connection and the number of feeds sent to Outlook.

While posting blogs, it also gives you the post ID generated from the server or the detail error message received while posting.

Newsgator import

RSS Feeder will automatically import Newsgator settings including all the subscription and Outlook folders at first run. This gives you zero effort for migrating from Newsgator to RSS Feeder.

Smart Client – definition and requirements according to MSDN

In order to qualify an application as a smart client, the application needs to fulfill the following requirements according to the definition of Smart Client at MSDN.

Local resources and user experience

All smart client applications share an ability to exploit local resources such as hardware for storage, processing or data capture such as compact flash memory, CPUs and scanners for example. Smart client solutions offer hi-fidelity end-user experiences by taking full advantage of all that the Microsoft® Windows® platform has to offer. Examples of well known smart client applications are Word, Excel, MS Money, and even PC games such as Half-Life 2. Unlike "browser-based" applications such as Amazon.Com or eBay.com, smart client applications live on your PC, laptop, Tablet PC, or smart device.

Connected

Smart client applications are able to readily connect to and exchange data with systems across the enterprise or the internet. Web services allow smart client solutions to utilize industry standard protocols such as XML, HTTP and SOAP to exchange information with any type of remote system.

Offline capable

Smart client applications work whether they are connected to the Internet or not. Microsoft® Money and Microsoft® Outlook are two great examples. Smart clients can take advantage of local caching and processing to enable operation during periods of no network connectivity or intermittent network connectivity. Offline capabilities are not only used in mobile scenarios but also by desktop solutions where they can take advantage of offline architecture to update backend systems on background threads, thus keeping the user interface responsive and improving the overall end-user experience. This architecture can also provide cost and performance benefits since the user interface need not be shuttled to the smart client from a server. Since smart clients can exchange just the data needed with other systems in the background, reductions in the volume of data exchanged with other systems are realized (even on hard-wired client systems this bandwidth reduction can realize huge benefits). This in turn increases the responsiveness of the user interface (UI) since the UI is not rendered by a remote system.

Intelligent deployment and update

In the past traditional client applications were difficult to deploy and update. It was not uncommon to install one application only to have it break another. Issues such as "DLL Hell" made installing and maintaining client applications difficult and frustrating. The Updater Application Block for .NET from the patterns and practices team provides prescriptive guidance to those who wish to create self-updating .NET Framework-based applications that are deployed across multiple desktops. The release of Visual Studio 2005 and the .NET Framework 2.0 will beckon a new era of simplified smart client deployment and updating with the release of a new deploy and update technology known as ClickOnce.

(The above text is copied and shortened from MSDN site.)

How is RSS Feeder a smart client

Let's look at how the RSS Feeder .NET is a smart client application:

Local resources and user experience

After downloading the application, it runs locally, stores all the feeds and your personal blogs in an MS Access Database and XML store respectively. It also gives you a rich user interface like Outlook 2003 to work with. As a result, you get the full benefits of desktop convenience yet all the information you are working with are produced in some web source.

Connected

The application connects to RSS Feed sources using HTTP and downloads the feeds to local store. It uses XML RPC for XMLRPC enabled blog engines in order to post web logs. Some of the famous XMLRPC supported blog engines are WordPress, B2Evolution, Drupal etc. It also uses Web Service to communicate with Blog engines like .Text and CommunityServer.

Offline capable

The application is fully offline capable. When it is not connected, you read the feeds from local store which are already downloaded. But when it gets connected, it automatically downloads recent feeds in the background and updates the view seamlessly.

Intelligent deployment and update

The application uses Updater Application Block 2.0 to provide auto update feature. Whenever I release a new version or deploy some bug fixes to a central server, all the users of that application automatically get the update behind the scene. This saves each and every user from going to the website and downloading the new version every time I release something new. It also allows me to instantly deliver bug fixes to everyone within a very short time.

Error reporting

This is my personal requirement for smart client. As the smart client is running on a distant computer, unlike web sites, you do not know whether the users are facing any problems or not and whether the applications are generating any exceptions. We need to provide some kind of error reporting feature that automatically captures the error and transmits the error to some central server so that the error can be fixed. You have seen this error reporting feature in Windows XP or Office 2003 applications. Whenever there is an error, it sends the error report to Microsoft. Similarly, this application also traps errors behind the scene and sends the exception trace to a tracking system at Sourceforge.

Multithreaded

Another favorite requirement of mine for making a client really smart is to make the application fully multithreaded. Smart clients should always be responsive. It must not get stuck whenever it is downloading or uploading data. Users will keep on working without knowing that there is something big going on in the background. For example, while a user is writing a blog, the application should download the feeds in the background without hindering the user’s authoring environment at all.

Crash proof

A smart client becomes a dumb client when it crashes in front of a user showing the dreaded “Continue” or “Quit” dialog box. In order to make your app truly smart, you need to catch any unhandled error and publish the error safely. In this app, I will show you how this has been done.

Topics covered

RSS Feeder is a complete application which deals with XML, XSLT, HTML, HTTP GET/POST, Configuration management, Cryptography, Logging, Auto Update, Rich UI, RSS/RDF/ATOM feed processing, XML RPC, Web Service, Blogging, Outlook automation, Multithreading and a lot more. Covering all these in detail requires the volume of a book. So, in this article I will just cover the tips and tricks I have used in all these areas. By the time you reach the end of this article, you will be fully equipped with real life experience of making a rich connected Smart Client application implementing the best practices and neat tricks you can find all over the web.

Enterprise Library

Enterprise Library is a major new release of the Microsoft patterns and practices application blocks. Application blocks are reusable software components designed to assist developers with common enterprise development challenges. Enterprise Library brings together new releases of the most widely used application blocks into a single integrated download.

The overall goals of the Enterprise Library are the following:

  • Consistency: All Enterprise Library application blocks feature consistent design patterns and implementation approaches.
  • Extensibility: All application blocks include defined extensibility points that allow developers to customize the behavior of the application blocks by adding it in their own code.
  • Ease of use: Enterprise Library offers numerous usability improvements, including a graphical configuration tool, a simpler installation procedure, and a clearer and more complete documentation and samples.
  • Integration: Enterprise Library application blocks are designed to work together and are tested to make sure that they do. It is also possible to use the application blocks individually (except in cases where the blocks depend on each other, such as on the Configuration Application Block).

Application blocks help address the common problems that developers face in every project. They have been designed to encapsulate the Microsoft recommended best practices for .NET applications. They can be added into .NET applications quickly and easily. For example, the Data Access Application Block provides access to the most frequently used features of ADO.NET in simple-to-use classes, boosting developer productivity. It also addresses scenarios not directly supported by the underlying class libraries. (Different applications have different requirements and you will find that every application block is not useful in every application that you build. Before using an application block, you should have a good understanding of your application requirements and of the scenarios that this application block is designed to address.)

RSS Feeder uses the following application blocks:

  • Caching Application Block: Caching frequently accessed data like the XSLT file which renders HTML from RSS feeds every time you click on an RSS Feed.
  • Configuration Application Block. Both static and dynamic configuration is handled by this block. For example, global configurations like whether you want Outlook integration, feed download interval etc.
  • Cryptography Application Block. Your weblog account password is encrypted using TripleDES algorithm.
  • Exception Handling Application Block. The entire application uses the exception handling block to handle all handled or unhandled exceptions.
  • Logging and Instrumentation Application Block. Informative, warning, debugs level logging and also error loggings are sent to a text file using this block.
  • Updater Application Block. Provides auto update from a central web server.

Connectivity

RSS Feeder has three ways connectivity:

  • Download latest updates from update server.
  • Download feeds from feed source.
  • Post to blog engines.

Auto update using Updater Application Block 2.0

The Updater Application block is a real pain to implement. After trying for weeks, I have finally given up its default BITSDownloader which does not work at all in my case. I have started using HTTPDownloader which works without any problem. When I use BITSDownloader I always get an error that the response header does not contain Content-Length. But if I capture the traffic, I can clearly see that Content-Length is present in the HTTP Response header. So, I have moved to the HTTPDownloader created by Kent Boogaart, for Katersoft. The HTTPDowloader uses nothing but the built-in HttpWebRequest to download the files from the server. It works both synchronously and asynchronously which suites all update scenarios.

The updater works this way:

  • First it gets the URL of the manifest XML. A manifest is prepared and stored in a server which describes the files that are needed to be downloaded.
  • After downloading the manifest, first it finds out the files that are needed to be downloaded according to the manifest. The it starts downloading them.
  • Once the download is complete, it spawns a console application which waits until RSS Feeder closes.
  • When RSS Feeder closes, the console application performs the updates and quits.

One problem I faced while downloading manifest is that the Proxy caches manifests. Normally your internet service provider uses proxies to cache web content. As XML is a web content, proxy servers nowadays cache XML files. As a result, even if I frequently update the manifest XML on the server, the proxies do not update their cache accordingly and returns an old version of the manifest file. This prevents auto update.

In order to solve this, I have injected a tick count at the end of the URI of the manifest. This always produces a unique URL and thus prevents proxies from returning old content.

/// Get the default configuration

// provider for updater block


Microsoft.ApplicationBlocks.Updater.
  Configuration.UpdaterConfigurationView view =
          new Microsoft.ApplicationBlocks.Updater.
              Configuration.UpdaterConfigurationView();

// Get the default manifest uri in order to add a unique

// timestamp at its end

// in order to avoid manifest file caching at proxies


Uri manifestUri = view.DefaultManifestUriLocation;
string uri = manifestUri.ToString();
uri += "?" + DateTime.Now.Ticks.ToString();

manifests = updater.CheckForUpdates(new Uri( uri ));

Deciding whether to overwrite old files with newer files

When you implement auto update in your application, you will come across a dilemma, whether to overwrite an older file with a newer file or not. For example, my RSS Feeder has several XSLT files which render RSS feeds to HTML. Now, the user has the freedom to change those files. So, if I release a new version of the file, I cannot overwrite those files without knowing whether the user has changed them or not. But I do need to update it because I may have fixed some problems in some of the files. Now you may think, duh! It’s an easy solution. Just check if the file dates are equal to the EXE’s date. If it is then the user has not changed the file. This is not always true because I release the updated EXE frequently. So, I cannot compare with the EXE’s date. The only way we can ensure we do not overwrite files accidentally is to preserve MD5 hash of each file and before copying ensure if the hash matches. I am not sure whether it is built into Updater Application block, but it would be nice to have this feature that automatically checks whether MD5 of a particular file matches before it overwrites a file.

Using Enterprise Library the easy way

Enterprise Library is a great piece of work. It can really take off a lot of framework development load from any type of application you develop. Normally when we go for developing a new project, we start with a framework which provides configuration management, security, cryptography, database access etc. All these require time to integrate and test. Enterprise Library saves you from developing such framework because it provides the best practices collected from years of experience of successful projects.

However, using Enterprise Library classes directly from all layers of your application is problematic. Very soon, you will feel the need for a wrapper class. Also, as it is a generic library, you first need to customize it a bit before getting started. Here I will present some extra tweaking that you can use to get started with Enterprise Library.

Let’s look at a typical configuration file which contains Caching, Cryptography, Exception and Logging application blocks:

Now look at the encircled sections. These are the things that you have to remember while using EL classes. For example, if you want to use the logging block, you have to provide the Category name:

Logger.Write( "Log message", "Debug" );

Again, if you want to use the Cache Manager, you will have to remember the name:

CacheManager manager = CacheFactory.GetCacheManager("XSL Cache");

If you want to use Security Block classes, you need to specify the security provider name e.g. MD5CryptoServiceProvider:

Cryptographer.CreateHash( "MD5CryptoServiceProvider", “hash me” );

So, very soon, your code will be full of EL classes and when any class’s signature changes or new version is released, as it did from Pattern and Practices Application Blocks to the new Enterprise Library, you will have to perform Search and Replace throughout your project to upgrade your code. This is a real pain. So, what I have done here is, I have made an Enterprise Library Helper class named EntLibHelper which you can use in the following way:

EntLibHelper.Info( “Information loggin”); // Log block


//Caching block

string hashedValue = EntLibHelper.Hash( “Hash me” );

// Exception block

EntLibHelper.Exception( “Some error occurred”, x );

So, next time, when Microsoft releases something called Universal Library, all you need to do is change the code inside EntLibHelper. Moreover, EntLibHelper handles all the troubles of initializing Enterprise Library blocks, ensures proper usage and provides a convenient interface which frees developers from remembering EL interfaces.

Check the source code for EntLibHelper. Remember, it is compatible with the app.config for RSSFeeder. If you make changes in any of the names, for example, renaming Logging Category or changing MD5 hashing to SHA hashing, you will have to change the names specified in the constants at the top of the class.

private const string EXCEPTION_POLICY_NAME = "General Policy";
private const string SYMMETRIC_INSTANCE = "DPAPI";

private const string HASH_INSTANCE = "MD5CryptoServiceProvider";
public const string XSL_CACHE_MANAGER = "XSL Cache";

RSS Feeder object model

RSS Feeder has a tiny object model as shown below:

Channels is a collection of Channel which represents a source of RSS feed. Channel is a collection of RSSFeed objects. RSSFeed object contains some basic information about a post like title, publish date, and the GUID which uniquely identifies a particular item. The entire XML received for each item is stored in the XML field.

Some of the items are extracted from the original XML to the RSS object for faster access. For example, when we render the list of feeds in a Listview, we need to show the title and the publish date. Moreover we need to sort on the publish date. This is why some of the fields are duplicated in the RSSFeed object which is taken from the actual XML.

Generally both RSS and Atom XML contain the title of the post, the author’s name, a unique ID, link to the post, publish date and a detail body.

RSS 2.0 XML format

Let’s look at a sample XML of RSS 2.0 format:

<?xml version="1.0"?>

<rss version="2.0" xmlns:dc=http://purl.org/dc/elements/1.1/
xmlns:admin=http://webns.net/mvcb/
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:content="http://purl.org/rss/1.0/modules/content/">

<channel>
   <title>.NET Community Blog</title>
   <link>http://localhost/b2evolution/index.php</link>
   <description>.NET Community Blog</description>

   <language>en-US</language>
   <docs>http://backend.userland.com/rss</docs>
   <ttl>60</ttl>

   <item>
      <title>Important information</title>
      <link>http://localhost/b2evolution/index.php?...</link>
      <pubDate>Fri, 17 Jun 2005 10:05:52 +0000</pubDate>

      <category domain="external">Announcements [A]</category>
      <category domain="alt">Announcements </category>

      <category domain="main">b2evolution Tips</category>
      <guid isPermaLink="false">21@http://localhost/b2evolution</guid>

      <description>Blog B contains a few posts in the
      'b2evolution Tips' category. </description>
      <content:encoded>
      <![CDATA[ <p>Blog B contains a few posts in the 'b2evolution Tips'
      category.</p>]]>

      </content:encoded>
      <comments>http://localhost/b2evolution/...</comments>
   </item>

The root node is <rss> which contains one or more <channel> nodes. <item> node represents one post. The body of the post is available in the <description> node. Some RSS Feed generators write both HTML and plain text content inside the <description>. However, some advanced generators write a plain text version inside the <description> and the actual HTML version with all the formatting inside the <content:encoded> node.

Uniquely identifying an item is troublesome because not all sites generate the guid node. For example, CodeProject RSS feeds contain no guid node. As a result, the only option you have to uniquely identify a feed is either to generate an MD5/SHA hash of the entire content and use that hash value as identifier or use the link node. RSS Feeder first looks whether there is any guid node, if not it uses the link node as the unique identifier. (To CodeProject: If this is not correct, let me know.)

Not all sites follow the RSS 2.0 format. Some sites are still using the RSS 0.9 format. Even those who do follow, do not always generate all the nodes properly. An example is the CodeProject RSS feeds, where you can see that the guid node is missing. You need to be careful while parsing RSS.

Atom 0.3 format

<?xml version="1.0" encoding="utf-8"?>

<feed version="0.3" xml:lang="en-US"
    xmlns="http://purl.org/atom/ns#">
<title>.NET Community Blog</title>

<link rel="alternate" type="text/html"
    href="http://localhost/b2evolution/index.php" />

<tagline>.NET Community Blog</tagline>
<generator url="http://b2evolution.net/"
    version="0.9.0.10">b2evolution</generator>

<modified>1970-01-01T00:00:00Z</modified>
<entry>
  <title type="text/plain"
      mode="xml">Important information</title>

  <link rel="alternate" type="text/html"
  href=http://localhost/b2evolution/index.php... />

  <author>
  <name>admin</name>
  </author>
  <id>http://localhost/b2evolution/index.php?...</id>

  <issued>2005-06-17T10:05:52Z</issued>
  <modified>1970-01-01T00:00:00Z</modified>
  <content type="text/html" mode="escaped">

  <![CDATA[ <p>Blog B contains a few posts in the
  'b2evolution Tips' category.</p>
  <p>All these entries are designed to help you so,
  as EdB would say:
  "<em>read them all before you start
  hacking away!</em>" ...]]></content>

</entry>

Atom type is also similar but looks a bit better to me than the RSS 2.0 format. The better things that I have noticed are, one the content node which has a nice type attribute that defines the type of the content. The mode attribute that helps to identify whether HTML decoding is required or not. It also has a nice id attribute which uniquely identifies an entry. The nodes are obvious and best of all; everyone generates consistent output when they produce Atom feed whereas RSS has several versions that are still widely being used.

Date handling

When you make an RSS aggregator, you will soon realize that people do not follow a consistent date format. Some use .NET’s DateTime format, some use PHP’s date format, some use Java’s date format and some even use RFC822 or RFC1123 date formats. There are so many different date formats people are using nowadays that you cannot find any code that can parse them all. After much struggle, I have finally come down to this function which tries to digest several possible date formats:

private DateTime FormatDate( string date )
{
    string RFC822 = "ddd, dd MMM yyyy HH:mm:ss zzz";

    //string RFC1123 = "yyyyMMddTHHmmss";


    //string RFCUnknown = "yyyy-MM-ddTHH:mm:ssZ";


    int indexOfPlus = date.LastIndexOf('+');
    if( indexOfPlus > 0 )
        date = date.Substring( 0, indexOfPlus-1 );

    string [] formats = new string[] { "r", "S", "U" };
    try

    {
      // Parse the dates using the standard

      // universal date format

      return DateTime.Parse(date,
           CultureInfo.InvariantCulture,
           DateTimeStyles.AdjustToUniversal);
    }
    catch

    {
       try
       {
          // Standard formats failed, try the "r" "S"

          // and "U" formats


          return DateTime.ParseExact( date, formats,
                    DateTimeFormatInfo.InvariantInfo,
                    DateTimeStyles.AdjustToUniversal);
       }
       catch
       {
           try
           {
              // All the standards formats have failed,


              //try the dreaded RFC822 format

              return DateTime.ParseExact( date, RFC822,
                      DateTimeFormatInfo.InvariantInfo,
                      DateTimeStyles.AdjustToUniversal);
           }
           catch
           {
              // All failed! The RSS Feed source

              // should be sued


              return DateTime.Now;
           }
       }
    }
}

Creating a generic parser for Atom, RDF and RSS feeds

A variety of widely accepted formats make developers’ lives difficult because they have to support all of the widely used specifications in their application. A feed aggregator needs to support RSS, Atom and RDF all together because all of these are widely used. This results in design complexity because you need to make your object model and parsing process generic for parsing and storing feeds of three different formats.

After much searching, I have finally realized that everyone produces different object models for RSS and Atom feeds. However, having different object model means you need to create different table structures in the database which is difficult to maintain. You also need to write code in all places which first checks whether it is dealing with Atom or RSS feed. This makes your application complicated. Such a problem is solved by XML. In XML, you can store different types of data, yet fully structured. Using XSL, you can easily render different XML structures the same way. So, it does not matter whether you have Atom or RSS content inside your XML, an XSL can easily check the type and produce same HTML output for viewing.

First let’s see the generic feed parser that I have made. The FeedProcessor.cs is the generic feed parser which can parse RSS/Atom/RDF the same way and produce Channel and RSSFeed objects in one generic format.

While parsing, first it sees whether the XML contains RSS or Atom or RDF:

public IList Parse( XmlReader reader )
{
    IList channels = new ArrayList();

    while( reader.Read() )
    {
       if( reader.NodeType == XmlNodeType.Element )
       {
           string name = reader.Name.ToLower();

           switch( name )
           {
               case "atom:feed":      // We have Atom Feed


               case "feed":   // We have Atom Feed

                      channels.Add( this.ProcessAtomFeed(reader));
                      break;
               case "rdf:rdf":        // We have rdf feed


               case "rdf":            // We have rdf feed

               case "rss:rss":        // We have rss feed


               case "rss":            // We have rss feed

                      channels.Add( this.ProcessRssFeed(reader));
                      break;
           }
       }
    }

    return channels;
}

For Atom feed, it calls the ProcessAtomFeed which parses the channel properties only:

private RssChannel ProcessAtomFeed( XmlReader reader )
{
    RssChannel channel = new RssChannel();
    channel.Type = RssTypeEnum.Atom;

    channel.Feeds = new ArrayList();
    while( reader.Read() )
    {
       if( reader.NodeType == XmlNodeType.Element )
       {
           string name = reader.Name;

           switch( name )
           {
               case "title":  // title for channel


                   channel.Title = ReadString( reader );
                   break;
               case "link":   // link to website

                   reader.MoveToAttribute("href");
                   if( reader.ReadAttributeValue() )
                   {
                           channel.Link = reader.Value;
                   }
                   break;
               case "tagline": // description of the channel


                   channel.Description = ReadString( reader );
                   break;
               case "description": // Same

                   channel.Description = ReadString( reader );
                   break;
               case "entry": // Aha! an entry


                   channel.Feeds.Add(this.ProcessAtomEntry(reader));
                   break;
           }
       }
       else if( reader.NodeType == XmlNodeType.EndElement )
       {
               if( reader.Name == "feed" )
                       break;
       }

    }

    return channel;
}

Similarly ProcessRssFeed function does the same job.

The complex part is parsing the entry or item node which actually contains the XML content. We need to parse it in two ways:

  • We need to discover some essential properties like publish date, title and GUID according to what we are parsing (Arom/RSS).
  • We need to store the everything we are reading in a buffer because XmlReader is a one way reader and we cannot go back once we have forwarded.

So, we not only need to use an XmlReader to read, but also an XmlWriter to write the same thing as we read in a temporary buffer.

The next design complexity is writing a generic function which parses the entry and item node in the same way. Although we can make two different functions for parsing entry and item nodes, the functions will have 90% code duplicated; the only differences are the name of some nodes and some structural differences. So, here is the function which parses it all:

private RssFeed ProcessFeedNode( XmlReader reader,
    string itemNodeName, string titleNodeName,
    string guidNodeName, string linkNodeName,
    string pubDateNodeName )
{
        RssFeed feed = new RssFeed();

        // Build a buffer which stores the


        // entire XML content of the entry

        StringBuilder buffer = new StringBuilder(1024);
        XmlTextWriter writer =
            new XmlTextWriter(new StringWriter(buffer));
        writer.Namespaces = false;
        writer.Indentation = 1;
        writer.IndentChar = '\t';
        writer.Formatting = Formatting.Indented;

        writer.WriteStartElement(itemNodeName);

        string lastNode = reader.Name;
        while( (reader.NodeType == XmlNodeType.Element 
           && lastNode != reader.Name) || reader.Read() )
        {
               if( reader.NodeType == XmlNodeType.Element )
               {
                       lastNode = reader.Name;

                       writer.WriteStartElement( reader.Name );
                       writer.WriteAttributes( reader, true );
                       if( reader.Name == titleNodeName )
                       {
                               feed.Title = ReadString( reader );
                               writer.WriteString(feed.Title);
                       }
                       else if( reader.Name == guidNodeName )
                       {
                               feed.Guid = ReadString( reader );
                               writer.WriteString(feed.Guid);
                       }
                       else if( reader.Name == linkNodeName )
                       {
                               // Atom feed contains the link as "href" attribute


                               string link = reader.GetAttribute("href", "");
                               if( null == link )
                               {
                                      // but Rss feed has the link as value


                                      link = ReadString( reader );
                                      writer.WriteString( link );
                               }

                               if( feed.Guid == null )
                               {
                                      feed.Guid = link;
                               }

                       }
                       else if( reader.Name == pubDateNodeName )
                       {
                               string date = ReadString( reader );
                               feed.PublishDate = this.FormatDate( date );
                               writer.WriteString(date);
                       }
                       else

                       {
                               writer.WriteRaw( reader.ReadInnerXml() );
                       }

                       // Close the element started

                       writer.WriteEndElement();

                       // For empty elements, ReadEndElement fails

                       if( reader.NodeType == XmlNodeType.EndElement )
                       {
                               if( reader.Name == itemNodeName ) break;

                               reader.ReadEndElement();
                       }
               }

               if( reader.NodeType == XmlNodeType.EndElement )
               {
                       if( reader.Name == itemNodeName )
                               break;
               }
        }

        writer.WriteEndElement();
        writer.Close();

        feed.XML = buffer.ToString();

        return feed;
}

Although this function is not optimal, we can optimize it in many ways. But it does the job pretty well. It parses a 200 KB feed in a fraction of a second without even occupying 5% of the CPU.

Tips 1. ReadString( reader ) or reader.ReadString()

You will see in the above code that I have used a custom function called ReadString instead of using XmlReader’s ReadString method. The documentation says ReadString method is supposed to read the content of the string. It is not supposed to jump off the end tag. But in practice, it does go over the end tag and stops at the next begin tag. So, if you are reading the <title> node, and call ReadString, the next node you will get is the <pubDate> node, not the </title>. But we need to know when a tag is closed so that we can close the tag in the XmlWriter also. This is why I have made the custom ReadString method:

private string ReadString( XmlReader reader )
{
    /// Reuse existing buffer in order to prevent


    /// frequent StringBuffer allocation

    buffer.Length = 0;
    /// Empty elements have no content

    if( reader.IsEmptyElement ) return string.Empty;

    /// Skip the begin tag and all white spaces before


    /// the first character of content is found

    while(!reader.EOF
         && ( reader.NodeType == XmlNodeType.Element
         || reader.NodeType == XmlNodeType.Whitespace ) )
       reader.Read();

    /// Read and store in buffer when we are getting text

    /// and CDATA sections.


    /// But stop immediately

    /// whenever we read the end element.

    while( reader.NodeType == XmlNodeType.CDATA
        || reader.NodeType == XmlNodeType.Text
        && reader.NodeType != XmlNodeType.EndElement )
    {
        buffer.Append( reader.Value );
        reader.Read();
    }

    /// Now the read is poting to the EndElement. Return


    /// the content of the buffer

    /// we have prepared for this node

    return buffer.ToString();
}

Converting Atom 0.3 to RSS 2.0

The best way to make this application simple and think only about RSS is to convert Atom XML to RSS XML just after downloading the content from web source. This way, the whole application can deal with the RSS feed and need not worry about all other formats. So, in future, if another format gets popular, all I need to do is write another converter which converts that format to RSS format. The application receives no significant changes.

In the source code, you will find atomtorss2.xslt which converts Atom 0.3 XML to RSS 2.0 XML. Here is a little excerpt of the XSLT which does the conversion:

<xsl:template name="items">
<xsl:for-each select="atom:entry">

  <item>
        <title><xsl:value-of select="atom:title"/></title>
        <link>

          <xsl:value-of select="atom:link[@rel='alternate']/@href"/>
        </link>
        <guid><xsl:value-of select="atom:id" /></guid>

        <description>
               <xsl:value-of select="atom:content" />
        </description>

        <pubDate>
               <xsl:choose>
               <xsl:when test='atom:issued'>
               <xsl:value-of select="date:format-date(atom:issued,'EEE,
                                              dd MMM yyyy hh:mm:ss z')"/>

               </xsl:when>
               <xsl:when test='atom:modified'>
               <xsl:value-of select="date:format-date(atom:modified,'EEE,
                                               dd MMM yyyy hh:mm:ss z')"/>

               </xsl:when>
               </xsl:choose>
        </pubDate>
  </item>
</xsl:for-each>

</xsl:template>

Now, here you will notice that, there’s a function format-date inside the select attribute. This function is not available in XSLT processor. So, how do we do this?

Introducing EXSLT project

EXSLT is taking XSLT processing to the next level. It supports a rich collection of functions that you can use in XSLT scripts which makes XSL a truly powerful script for real world XML transformation. The beauty of XsltTransformer is that, it allows you to write pure .NET functions which are invoked whenever they are called from XSLT script. Using this feature, you can make complicated XML transformation which fully utilizes the full power of .NET platform. You can even write functions that can call a database and get dynamic values and put that value in the resulting XML.

Using EXSLT is very easy. The following code shows how an Atom 0.3 XML is converted to RSS 2.0 XML:

void EXSLT()
{
        ExsltTransform xslt = new ExsltTransform();
        xslt.SupportedFunctions = ExsltFunctionNamespace.All;
        xslt.MultiOutput = false;
        xslt.Load("atomtorss2.xslt");
        xslt.Transform("atom.xml", "rss.xml");
}

XML serialization/deserialization

Although we all know it, I always forget to do the right thing to serialize ArrayList which contains custom objects. So, I have made a convenient SerializationHelper class which exposes variants of Serialize or Deserialize functions which you can use in your applications:

public static XmlWriter Serialize( Stream stream, object o )
{
        XmlTextWriter writer = new XmlTextWriter( stream,
                               System.Text.Encoding.UTF8 );
        XmlSerializer serializer =
                          new XmlSerializer( o.GetType() );
        serializer.Serialize( writer, o );
        return writer;
}

public static XmlWriter Serialize( Stream stream,
                         ArrayList array, Type type )
{
        XmlTextWriter writer = new XmlTextWriter( stream,
                                System.Text.Encoding.UTF8 );
        XmlSerializer serializer = new
           XmlSerializer(typeof(ArrayList), new Type[] {type});
        serializer.Serialize( writer, array );
        return writer;
}


public static object Deserialize( Stream stream, Type t )
{
        XmlTextReader reader = new XmlTextReader( stream );
        XmlSerializer serializer = new XmlSerializer( t );
        object o =  serializer.Deserialize( reader );
        return o;
}


public static ArrayList DeserializeArraylist(Stream stream,
                                                    Type t)
{
        XmlTextReader reader = new XmlTextReader( stream );
        XmlSerializer serializer = new
            XmlSerializer( typeof( ArrayList ), new Type [] { t } );
        ArrayList list =
            (ArrayList) serializer.Deserialize( reader );
        return list;
}

In order to serialize an ArrayList which contains objects of type RSSFeed, you issue the following command:

Serialize( stream, arrayList, typeof( RSSFeed ) );

Tips 2. Serializing an object which has property of type array

If you have a property of type ArrayList, the serialized XML looks very ugly and does not look like what we call – strongly typed if that applies to XML. In order to customize how an ArrayList is serialized, you can try the two attributes XmlArray and XmlArrayItem:

private ArrayList _WebLogs = new ArrayList();

[XmlArray("weblogs"), XmlArrayItem("weblog", typeof(WebLog)) ]

public ArrayList WebLogs
{
        get { return _WebLogs; }
        set { _WebLogs = value; }
}

This way you will get a nice XML output.

Embedded resource

The easiest way to carry additional files with your projects is not to carry them as external files, instead putting them inside an assembly. Visual Studio has a nice property named Build Action for this purpose:

If you set the property to Embedded Resource, that file is linked inside the assembly. This means that the entire content of the file is embedded inside the assembly as inline resource. As a result, you can make one assembly which contains all the files you need. You can directly read the files from the assembly as a Stream without opening as a file. Each file is embedded inside the assembly by the following naming convention:

NameSpace.FileName.Extention

So, for the file atomtorss2.xslt, the full name of the embedded resource is:

RSSFeederResources.atomtorss2.xslt

The best thing about embedded resources is that they are not in the file system as separate physical files. So, you never need to worry about the path of the files. You can directly read the content of the file as a Stream. However, the embedded resource is read-only. If you want to modify the content, then there is no way known to me so far to do that. In that, case you will have to create a file from embedded resource and use that file always.

The SerializationHelper has some convenient functions to deal with embedded assemblies. For example:

public static Assembly GetResourceAssembly()
{
   return Assembly.LoadFrom("RSSFeederResources.dll");
}

public static Stream GetStream( string name )
{
   return GetResourceAssembly().GetManifestResourceStream(
                           RESOURCE_ASSEMBLY_PREFIX + name);
}

GetStream returns an embedded resource as a Stream.

public static System.Drawing.Icon GetIcon( string name )
{
        using( Stream stream = GetStream( name ) )
        {
               return new System.Drawing.Icon( stream );
        }
}

You get an Icon object directly from an embedded icon file. This is very convenient for carrying all the icons that your application uses. You don’t need to maintain separate icon files and worry about their path.

public static void WriteEmbeddedFile( string name, string fileName )
{
        using( Stream stream = GetStream( name ) )
        {
               FileInfo file = new FileInfo( fileName );
               using( FileStream fileStream = file.Create() )
               {
                       byte [] buf = new byte[ 1024 ];
                       int size;
                       while( (size = stream.Read( buf, 0, 1024 )) > 0 )
                       {
                               fileStream.Write( buf, 0, size );
                       }
               }
        }
}

The WriteEmbeddedFile function generates a file from the content of the embedded resource.

Storing data in application data folder

The best place to store application specific data is not in the folder where your program is installed but in the “Application Data” folder that Windows® creates for each user. There are actually two Application Data folders. One is directly under the user folder and another is inside the hidden folder "Local Settings". So, if you user name is Omar AL Zabir, the path to these folders will be:

C:\Documents and Settings\Omar Al Zabir\Application Data

And the secret one is:

C:\Documents and Settings\Omar Al Zabir\Local Settings\Application Data

The major difference between them is that the first one is visible. You can browse to that folder using Explorer. But the second one is hidden by default. You need to turn on “Show Hidden Files” from Explorer Options in order to see that folder.

Another difference is that, when you have a roaming profile in Windows®, the first visible folder is synchronized to the network store. So, whatever you store in that folder, you can easily access it when you login from another computer inside the domain. This location is better for storing files than the location where your application .exe is located because that does not roam. Remember, the second folder is computer specific and it is not synchronized when you move from one computer to another.

RSS Feeder stores all the data in the second folder.

You can get the path to these special folders using the Environment.GetFolderPath function. It takes an enumeration as shown in the following code snippet:

enum System.Environment.SpecialFolder
{
  ApplicationData
  CommonApplicationData
  LocalApplicationData
...
}

There are many other useful folders like the Desktop, Program Files, My Documents etc. All these are made available from this enum. Here’s how you use it:

// Prepare the path where all application specific


// settings will be stored

string appDataPath =  Environment.GetFolderPath(
         Environment.SpecialFolder.LocalApplicationData);

ApplicationSettings.ApplicationDataPath =
    Path.Combine( appDataPath, "RSS Feeder" );

// Check if all these paths exist

if( !Directory.Exists(ApplicationSettings.ApplicationDataPath))
{
   Directory.CreateDirectory(ApplicationSettings.ApplicationDataPath);
}

This way you can setup your own folder for storing all the application specific files.

OPML

OPML is an XML-based format that allows exchange of outline-structured information between applications running on different operating systems and environments. OPML is used to store information about RSS Feed sources. For example, a blog site uses OPML to store all the blog titles and feed locations that it contains. If you go to blogs.msdn.com you will get the OPML for all the Microsoft bloggers’ feed URL and title.

<opml>
  <body>
    <outline text="Microsoft Bloggers">

      <outline title="Alex Lowe's .NET Blog"
    htmlUrl=http://blogs.msdn.com/alowe/default.aspx
    xmlUrl="http://blogs.msdn.com/alowe/rss.aspx" />

      <outline title="Michał Cierniak"
    htmlUrl=http://blogs.msdn.com/michaljc/default.aspx
    xmlUrl="http://blogs.msdn.com/michaljc/rss.aspx" />

...
...

The above XML is an excerpt of OPML from blogs.msdn.com.

RSS aggregators use OPML to exchange subscription information among other aggregators. For example, you can export all your subscriptions as OPML from Newsgator and then import the OPML to my RSS Feeder. In fact you don’t need to do it at all. Whenever you run it, it will import the Newsgator settings. Beware, Newsgator's XML contains "xmlurl" but all other OPML use "xmlUrl". The difference is in the case of "U".

Newsgator stores its subscription information in an OPML. Its OPML is a bit different:

<?xml version="1.0" encoding="utf-8"?>

<opml xmlns:ng="http://newsgator.com/schema/opml">
  <body>
    <outline title="NewsGator News and Updates"
      description="NewsGator News and Updates"

      lastItemMD5="eFY8dWcOAPZBbpG7Ha1l5g==,XqCFJLx/DJigD7YRFnD3OA==,..."
      xmlurl=http://www.newsgator.com/news/rss.aspx
      htmlurl="http://www.newsgator.com"
      ng:folderName="" ng:folderType="auto"

      ng:useDefaultCredentials="false"
      ng:username="" ng:passwordenc=""
      ng:domain="" ng:useGuid="false"

      ng:interval="0" ng:nntpMostRecent="-1"
      ng:newsPage="true" ng:renderTransform=""

      ng:downloadAttachments="false" />
</body>
</opml>

It uses additional properties like ng:folderName and ng:folderType to describe Outlook Folder location where subscriptions are mapped. RSS Feeder provides Newsgator import feature. It uses this information to map to the same folder as Newsgator does.

The OpmlHelper class in the source code provides OPML parsing and generation functionality.

Database optimization

As RSS Feeder uses an MS Access database, it does not have much problem with multiple clients connecting to database or extra design issues that we need to consider while using SQL Server. However, we do need to optimize connection open and close as MS Access takes pretty long time to open and close connections.

So, what I have done here is open a static connection when the app starts and use that connection throughout the application. When the application closes, it closes the open connection. This gives significant performance boost than opening and closing connection whenever we access the database.

However, static connection object leads to multithreading issues. Sometimes two threads can try to execute commands on the same connection at the same time. For example, imagine Feed Downloader is downloading feeds in the background and you are reading the feeds. Now both you and the feed downloader are trying to read/write feeds from the database at the same time. At least one will fail as the connection will be in Executing state instead of Open state. In order to prevent this, I have implemented a Thread.Sleep whenever I see the connection is in Executing state.

private static OleDbConnection __Connection = null;

private static OleDbConnection _Connection
{
    get
    {
       if( null == __Connection )
       {
           string connectionString = string.Format
           ( "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};",
                          ApplicationSettings.DatabaseFilePath );
           __Connection = new OleDbConnection( connectionString );

           __Connection.Open();
       }
       else

       {
           while( ConnectionState.Executing == __Connection.State
               || ConnectionState.Fetching == __Connection.State )
           {
                   System.Threading.Thread.Sleep( 50 );
           }

           if( ConnectionState.Open != __Connection.State )
                   __Connection.Open();
       }

       return __Connection;
    }
}

public static void Close()
{