Contents
RSS Feeder.NET is a free open source desktop RSS feed aggregator which downloads feeds from web sources and stores them locally for offline viewing, searching and processing. It is also a rich blogging tool which you can use to blog a variety of blog engines including WordPress, B2Evolution, .Text, Community Server etc. You can be fully MS Outlook® dependent or can run fully standalone. You can also use both at the same time whichever you find comfortable to work with. It does not increase the Outlook load time, nor does it make the Outlook slow or prevent it from closing properly. It is a Smart Client that makes best use of both Local Resource and Distributed Web Information sources.
Update History
- August 9, 2005 - Source code updated, Web Log Manager has several fixes.
- August 8, 2005 - When you first load the project and build, you will find an error in WebLogManager.resx. The solution is to simply load the form, do something to make it dirty, and save the form.
Features
- Newspaper mode. You can read feeds in a more readable newspaper mode called “Blogpaper”.
- Auto Discovery. Drag any hyper link and I will find out whether there is any RSS Feed in that page.
- Outlook integration. You can store feeds in Outlook folders.
- Blogging. It provides you a Outlook 2003 style convenient workspace to manage your blog accounts and write rich posts.
- Blog from Outlook. You can specify an Outlook folder for a weblog account. All the posts from that folder is automatically posted to the weblog during synchronization. You can write posts as HTML using Word editor. Post content (HTML markup) is cleaned rigorously before posting to the weblog.
- Outlook View. It uses a customized view to present a more readable list of feeds in Outlook Folders. The standard Post view is not easy to browse through quickly. The view puts the subject first in bold and an excerpt of the post under the subject.
- Optimized Startup. You can safely put RSS Feeder at startup and it won’t make your Windows® start slower. A clever lazy loading process puts no effort on Windows® during startup instead starts the app when Windows® has finally regained its strength after the long boot up struggle.
- Newsgator Import. Newsgator users can use RSS Feeder to import all the subscriptions and seamlessly replace Newsgator without modifying the Outlook folder locations.
Let’s have a small walkthrough of all the features of RSS Feeder .NET:
The three pane view of Outlook is a user interface engineering marvel. It makes browsing through information fast and easy. In RSS Feeder, I have followed the same concept. The left most pane is the channel list. When you click a channel you see the list of feeds in the middle list. You can instantly search through feeds using the search box at the bottom middle. When you click on a feed, the feed content is displayed on the right side viewer.
You can directly edit channel properties from the left bottom property grid. Click on the title “Properties” to show the property grid. .NET’s built in property grid makes it very convenient to expose objects and allow users to modify the objects conveniently. I have implemented several extensions on property grid which I will be explaining later on.
The most convenient reading view for blogs is the newspaper style view which I call “The Daily Blogpapper”. You can just open the Blogpaper, read through all the latest posts collected from all the channels (optionally excluded) and click on “Mark as Read”. It’s as simple as this.
The previous Outlook style view is useful to work with feeds and channels. But it is not a convenient reading environment. The Blogpaper gives you a truly comfortable reading environment for everyday reading.
RSS Feeder is not just an RSS Aggregator; it’s also an equally powerful feature rich blogging tool.
Outlook 2003 style view is created using the amazing free UI components which you can find at Divelements Ltd. On the left side, you can see the weblogs accounts and your posts in that account. The right side is the editing environment.
RSS Feeder currently supports MetaWebLogAPI
supported blog engines like WordPress, B2Evolution, Drupal etc. and also some XML web service based blog engines like .Text and Community Server. We will see how we can implement these in detail later on.
If you don’t like my app, you can remain a MS Outlook user. When you install the application, it will ask you whether you need Outlook integration. You just need to specify a base folder which will contain all the child channel folders and the RSS Feeder will feed Outlook with all the feeds.
The convenient view of feed list makes it very easy to skim through lots of post rapidly. The auto preview also gives a glimpse of the post content eliminating the need to open each post and we can decide whether to keep or throw it away.
This is my favorite, you can create posts in an Outlook folder and then create a web log account which maps to that folder. All the posts in that folder are delivered to the web log. You can drag posts from some other place or can write new posts in that folder. Whenever RSS Feeder runs its periodical synchronization (every 5 mins) it will read the posts from the folder and then send the posts to the corresponding blog site.
The Send/Receive window of RSS Feeder gives you detailed statistics about feed synchronization. You can view the errors received from the server, you can see the speed of your internet connection and the number of feeds sent to Outlook.
While posting blogs, it also gives you the post ID generated from the server or the detail error message received while posting.
Newsgator import
RSS Feeder will automatically import Newsgator settings including all the subscription and Outlook folders at first run. This gives you zero effort for migrating from Newsgator to RSS Feeder.
In order to qualify an application as a smart client, the application needs to fulfill the following requirements according to the definition of Smart Client at MSDN.
Local resources and user experience
All smart client applications share an ability to exploit local resources such as hardware for storage, processing or data capture such as compact flash memory, CPUs and scanners for example. Smart client solutions offer hi-fidelity end-user experiences by taking full advantage of all that the Microsoft® Windows® platform has to offer. Examples of well known smart client applications are Word, Excel, MS Money, and even PC games such as Half-Life 2. Unlike "browser-based" applications such as Amazon.Com or eBay.com, smart client applications live on your PC, laptop, Tablet PC, or smart device.
Connected
Smart client applications are able to readily connect to and exchange data with systems across the enterprise or the internet. Web services allow smart client solutions to utilize industry standard protocols such as XML, HTTP and SOAP to exchange information with any type of remote system.
Offline capable
Smart client applications work whether they are connected to the Internet or not. Microsoft® Money and Microsoft® Outlook are two great examples. Smart clients can take advantage of local caching and processing to enable operation during periods of no network connectivity or intermittent network connectivity. Offline capabilities are not only used in mobile scenarios but also by desktop solutions where they can take advantage of offline architecture to update backend systems on background threads, thus keeping the user interface responsive and improving the overall end-user experience. This architecture can also provide cost and performance benefits since the user interface need not be shuttled to the smart client from a server. Since smart clients can exchange just the data needed with other systems in the background, reductions in the volume of data exchanged with other systems are realized (even on hard-wired client systems this bandwidth reduction can realize huge benefits). This in turn increases the responsiveness of the user interface (UI) since the UI is not rendered by a remote system.
Intelligent deployment and update
In the past traditional client applications were difficult to deploy and update. It was not uncommon to install one application only to have it break another. Issues such as "DLL Hell" made installing and maintaining client applications difficult and frustrating. The Updater Application Block for .NET from the patterns and practices team provides prescriptive guidance to those who wish to create self-updating .NET Framework-based applications that are deployed across multiple desktops. The release of Visual Studio 2005 and the .NET Framework 2.0 will beckon a new era of simplified smart client deployment and updating with the release of a new deploy and update technology known as ClickOnce.
(The above text is copied and shortened from MSDN site.)
Let's look at how the RSS Feeder .NET is a smart client application:
Local resources and user experience
After downloading the application, it runs locally, stores all the feeds and your personal blogs in an MS Access Database and XML store respectively. It also gives you a rich user interface like Outlook 2003 to work with. As a result, you get the full benefits of desktop convenience yet all the information you are working with are produced in some web source.
Connected
The application connects to RSS Feed sources using HTTP and downloads the feeds to local store. It uses XML RPC for XMLRPC enabled blog engines in order to post web logs. Some of the famous XMLRPC supported blog engines are WordPress, B2Evolution, Drupal etc. It also uses Web Service to communicate with Blog engines like .Text and CommunityServer.
Offline capable
The application is fully offline capable. When it is not connected, you read the feeds from local store which are already downloaded. But when it gets connected, it automatically downloads recent feeds in the background and updates the view seamlessly.
Intelligent deployment and update
The application uses Updater Application Block 2.0 to provide auto update feature. Whenever I release a new version or deploy some bug fixes to a central server, all the users of that application automatically get the update behind the scene. This saves each and every user from going to the website and downloading the new version every time I release something new. It also allows me to instantly deliver bug fixes to everyone within a very short time.
Error reporting
This is my personal requirement for smart client. As the smart client is running on a distant computer, unlike web sites, you do not know whether the users are facing any problems or not and whether the applications are generating any exceptions. We need to provide some kind of error reporting feature that automatically captures the error and transmits the error to some central server so that the error can be fixed. You have seen this error reporting feature in Windows XP or Office 2003 applications. Whenever there is an error, it sends the error report to Microsoft. Similarly, this application also traps errors behind the scene and sends the exception trace to a tracking system at Sourceforge.
Multithreaded
Another favorite requirement of mine for making a client really smart is to make the application fully multithreaded. Smart clients should always be responsive. It must not get stuck whenever it is downloading or uploading data. Users will keep on working without knowing that there is something big going on in the background. For example, while a user is writing a blog, the application should download the feeds in the background without hindering the user’s authoring environment at all.
Crash proof
A smart client becomes a dumb client when it crashes in front of a user showing the dreaded “Continue” or “Quit” dialog box. In order to make your app truly smart, you need to catch any unhandled error and publish the error safely. In this app, I will show you how this has been done.
RSS Feeder is a complete application which deals with XML, XSLT, HTML, HTTP GET/POST, Configuration management, Cryptography, Logging, Auto Update, Rich UI, RSS/RDF/ATOM feed processing, XML RPC, Web Service, Blogging, Outlook automation, Multithreading and a lot more. Covering all these in detail requires the volume of a book. So, in this article I will just cover the tips and tricks I have used in all these areas. By the time you reach the end of this article, you will be fully equipped with real life experience of making a rich connected Smart Client application implementing the best practices and neat tricks you can find all over the web.
Enterprise Library is a major new release of the Microsoft patterns and practices application blocks. Application blocks are reusable software components designed to assist developers with common enterprise development challenges. Enterprise Library brings together new releases of the most widely used application blocks into a single integrated download.
The overall goals of the Enterprise Library are the following:
- Consistency: All Enterprise Library application blocks feature consistent design patterns and implementation approaches.
- Extensibility: All application blocks include defined extensibility points that allow developers to customize the behavior of the application blocks by adding it in their own code.
- Ease of use: Enterprise Library offers numerous usability improvements, including a graphical configuration tool, a simpler installation procedure, and a clearer and more complete documentation and samples.
- Integration: Enterprise Library application blocks are designed to work together and are tested to make sure that they do. It is also possible to use the application blocks individually (except in cases where the blocks depend on each other, such as on the Configuration Application Block).
Application blocks help address the common problems that developers face in every project. They have been designed to encapsulate the Microsoft recommended best practices for .NET applications. They can be added into .NET applications quickly and easily. For example, the Data Access Application Block provides access to the most frequently used features of ADO.NET in simple-to-use classes, boosting developer productivity. It also addresses scenarios not directly supported by the underlying class libraries. (Different applications have different requirements and you will find that every application block is not useful in every application that you build. Before using an application block, you should have a good understanding of your application requirements and of the scenarios that this application block is designed to address.)
RSS Feeder uses the following application blocks:
- Caching Application Block: Caching frequently accessed data like the XSLT file which renders HTML from RSS feeds every time you click on an RSS Feed.
- Configuration Application Block. Both static and dynamic configuration is handled by this block. For example, global configurations like whether you want Outlook integration, feed download interval etc.
- Cryptography Application Block. Your weblog account password is encrypted using TripleDES algorithm.
- Exception Handling Application Block. The entire application uses the exception handling block to handle all handled or unhandled exceptions.
- Logging and Instrumentation Application Block. Informative, warning, debugs level logging and also error loggings are sent to a text file using this block.
- Updater Application Block. Provides auto update from a central web server.
RSS Feeder has three ways connectivity:
- Download latest updates from update server.
- Download feeds from feed source.
- Post to blog engines.
The Updater Application block is a real pain to implement. After trying for weeks, I have finally given up its default BITSDownloader
which does not work at all in my case. I have started using HTTPDownloader
which works without any problem. When I use BITSDownloader
I always get an error that the response header does not contain Content-Length
. But if I capture the traffic, I can clearly see that Content-Length
is present in the HTTP Response header. So, I have moved to the HTTPDownloader
created by Kent Boogaart, for Katersoft. The HTTPDowloader
uses nothing but the built-in HttpWebRequest
to download the files from the server. It works both synchronously and asynchronously which suites all update scenarios.
The updater works this way:
- First it gets the URL of the manifest XML. A manifest is prepared and stored in a server which describes the files that are needed to be downloaded.
- After downloading the manifest, first it finds out the files that are needed to be downloaded according to the manifest. The it starts downloading them.
- Once the download is complete, it spawns a console application which waits until RSS Feeder closes.
- When RSS Feeder closes, the console application performs the updates and quits.
One problem I faced while downloading manifest is that the Proxy caches manifests. Normally your internet service provider uses proxies to cache web content. As XML is a web content, proxy servers nowadays cache XML files. As a result, even if I frequently update the manifest XML on the server, the proxies do not update their cache accordingly and returns an old version of the manifest file. This prevents auto update.
In order to solve this, I have injected a tick count at the end of the URI of the manifest. This always produces a unique URL and thus prevents proxies from returning old content.
Microsoft.ApplicationBlocks.Updater.
Configuration.UpdaterConfigurationView view =
new Microsoft.ApplicationBlocks.Updater.
Configuration.UpdaterConfigurationView();
Uri manifestUri = view.DefaultManifestUriLocation;
string uri = manifestUri.ToString();
uri += "?" + DateTime.Now.Ticks.ToString();
manifests = updater.CheckForUpdates(new Uri( uri ));
When you implement auto update in your application, you will come across a dilemma, whether to overwrite an older file with a newer file or not. For example, my RSS Feeder has several XSLT files which render RSS feeds to HTML. Now, the user has the freedom to change those files. So, if I release a new version of the file, I cannot overwrite those files without knowing whether the user has changed them or not. But I do need to update it because I may have fixed some problems in some of the files. Now you may think, duh! It’s an easy solution. Just check if the file dates are equal to the EXE’s date. If it is then the user has not changed the file. This is not always true because I release the updated EXE frequently. So, I cannot compare with the EXE’s date. The only way we can ensure we do not overwrite files accidentally is to preserve MD5 hash of each file and before copying ensure if the hash matches. I am not sure whether it is built into Updater Application block, but it would be nice to have this feature that automatically checks whether MD5 of a particular file matches before it overwrites a file.
Enterprise Library is a great piece of work. It can really take off a lot of framework development load from any type of application you develop. Normally when we go for developing a new project, we start with a framework which provides configuration management, security, cryptography, database access etc. All these require time to integrate and test. Enterprise Library saves you from developing such framework because it provides the best practices collected from years of experience of successful projects.
However, using Enterprise Library classes directly from all layers of your application is problematic. Very soon, you will feel the need for a wrapper class. Also, as it is a generic library, you first need to customize it a bit before getting started. Here I will present some extra tweaking that you can use to get started with Enterprise Library.
Let’s look at a typical configuration file which contains Caching, Cryptography, Exception and Logging application blocks:
Now look at the encircled sections. These are the things that you have to remember while using EL classes. For example, if you want to use the logging block, you have to provide the Category name:
Logger.Write( "Log message", "Debug" );
Again, if you want to use the Cache Manager, you will have to remember the name:
CacheManager manager = CacheFactory.GetCacheManager("XSL Cache");
If you want to use Security Block classes, you need to specify the security provider name e.g. MD5CryptoServiceProvider
:
Cryptographer.CreateHash( "MD5CryptoServiceProvider", "hash me" );
So, very soon, your code will be full of EL classes and when any class’s signature changes or new version is released, as it did from Pattern and Practices Application Blocks to the new Enterprise Library, you will have to perform Search and Replace throughout your project to upgrade your code. This is a real pain. So, what I have done here is, I have made an Enterprise Library Helper class named EntLibHelper
which you can use in the following way:
EntLibHelper.Info( "Information loggin");
string hashedValue = EntLibHelper.Hash( "Hash me" );
EntLibHelper.Exception( "Some error occurred", x );
So, next time, when Microsoft releases something called Universal Library, all you need to do is change the code inside EntLibHelper
. Moreover, EntLibHelper
handles all the troubles of initializing Enterprise Library blocks, ensures proper usage and provides a convenient interface which frees developers from remembering EL interfaces.
Check the source code for EntLibHelper
. Remember, it is compatible with the app.config
for RSSFeeder. If you make changes in any of the names, for example, renaming Logging Category or changing MD5 hashing to SHA hashing, you will have to change the names specified in the constants at the top of the class.
private const string EXCEPTION_POLICY_NAME = "General Policy";
private const string SYMMETRIC_INSTANCE = "DPAPI";
private const string HASH_INSTANCE = "MD5CryptoServiceProvider";
public const string XSL_CACHE_MANAGER = "XSL Cache";
RSS Feeder has a tiny object model as shown below:
Channels
is a collection of Channel
which represents a source of RSS feed. Channel
is a collection of RSSFeed
objects. RSSFeed
object contains some basic information about a post like title, publish date, and the GUID which uniquely identifies a particular item. The entire XML received for each item is stored in the XML field.
Some of the items are extracted from the original XML to the RSS object for faster access. For example, when we render the list of feeds in a Listview
, we need to show the title
and the publish date. Moreover we need to sort on the publish date. This is why some of the fields are duplicated in the RSSFeed
object which is taken from the actual XML.
Generally both RSS and Atom XML contain the title of the post, the author’s name, a unique ID, link to the post, publish date and a detail body.
Let’s look at a sample XML of RSS 2.0 format:
="1.0"
<rss version="2.0" xmlns:dc=http://purl.org/dc/elements/1.1/
xmlns:admin=http://webns.net/mvcb/
xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>.NET Community Blog</title>
<link>http://localhost/b2evolution/index.php</link>
<description>.NET Community Blog</description>
<language>en-US</language>
<docs>http://backend.userland.com/rss</docs>
<ttl>60</ttl>
<item>
<title>Important information</title>
<link>http://localhost/b2evolution/index.php?...</link>
<pubDate>Fri, 17 Jun 2005 10:05:52 +0000</pubDate>
<category domain="external">Announcements [A]</category>
<category domain="alt">Announcements </category>
<category domain="main">b2evolution Tips</category>
<guid isPermaLink="false">21@http://localhost/b2evolution</guid>
<description>Blog B contains a few posts in the
'b2evolution Tips' category. </description>
<content:encoded>
<![CDATA[
</content:encoded>
<comments>http://localhost/b2evolution/...</comments>
</item>
The root node is <rss>
which contains one or more <channel>
nodes. <item>
node represents one post. The body of the post is available in the <description>
node. Some RSS Feed generators write both HTML and plain text content inside the <description>
. However, some advanced generators write a plain text version inside the <description>
and the actual HTML version with all the formatting inside the <content:encoded>
node.
Uniquely identifying an item is troublesome because not all sites generate the guid
node. For example, CodeProject RSS feeds contain no guid
node. As a result, the only option you have to uniquely identify a feed is either to generate an MD5/SHA hash of the entire content and use that hash value as identifier or use the link
node. RSS Feeder first looks whether there is any guid
node, if not it uses the link
node as the unique identifier. (To CodeProject: If this is not correct, let me know.)
Not all sites follow the RSS 2.0 format. Some sites are still using the RSS 0.9 format. Even those who do follow, do not always generate all the nodes properly. An example is the CodeProject RSS feeds, where you can see that the guid
node is missing. You need to be careful while parsing RSS.
="1.0"="utf-8"
<feed version="0.3" xml:lang="en-US"
xmlns="http://purl.org/atom/ns#">
<title>.NET Community Blog</title>
<link rel="alternate" type="text/html"
href="http://localhost/b2evolution/index.php" />
<tagline>.NET Community Blog</tagline>
<generator url="http://b2evolution.net/"
version="0.9.0.10">b2evolution</generator>
<modified>1970-01-01T00:00:00Z</modified>
<entry>
<title type="text/plain"
mode="xml">Important information</title>
<link rel="alternate" type="text/html"
href=http://localhost/b2evolution/index.php... />
<author>
<name>admin</name>
</author>
<id>http://localhost/b2evolution/index.php?...</id>
<issued>2005-06-17T10:05:52Z</issued>
<modified>1970-01-01T00:00:00Z</modified>
<content type="text/html" mode="escaped">
<![CDATA[</content>
</entry>
Atom type is also similar but looks a bit better to me than the RSS 2.0 format. The better things that I have noticed are, one the content
node which has a nice type
attribute that defines the type of the content. The mode
attribute that helps to identify whether HTML decoding is required or not. It also has a nice id
attribute which uniquely identifies an entry
. The nodes are obvious and best of all; everyone generates consistent output when they produce Atom feed whereas RSS has several versions that are still widely being used.
When you make an RSS aggregator, you will soon realize that people do not follow a consistent date format. Some use .NET’s DateTime
format, some use PHP’s date format, some use Java’s date format and some even use RFC822 or RFC1123 date formats. There are so many different date formats people are using nowadays that you cannot find any code that can parse them all. After much struggle, I have finally come down to this function which tries to digest several possible date formats:
private DateTime FormatDate( string date )
{
string RFC822 = "ddd, dd MMM yyyy HH:mm:ss zzz";
int indexOfPlus = date.LastIndexOf('+');
if( indexOfPlus > 0 )
date = date.Substring( 0, indexOfPlus-1 );
string [] formats = new string[] { "r", "S", "U" };
try
{
return DateTime.Parse(date,
CultureInfo.InvariantCulture,
DateTimeStyles.AdjustToUniversal);
}
catch
{
try
{
return DateTime.ParseExact( date, formats,
DateTimeFormatInfo.InvariantInfo,
DateTimeStyles.AdjustToUniversal);
}
catch
{
try
{
return DateTime.ParseExact( date, RFC822,
DateTimeFormatInfo.InvariantInfo,
DateTimeStyles.AdjustToUniversal);
}
catch
{
return DateTime.Now;
}
}
}
}
A variety of widely accepted formats make developers’ lives difficult because they have to support all of the widely used specifications in their application. A feed aggregator needs to support RSS, Atom and RDF all together because all of these are widely used. This results in design complexity because you need to make your object model and parsing process generic for parsing and storing feeds of three different formats.
After much searching, I have finally realized that everyone produces different object models for RSS and Atom feeds. However, having different object model means you need to create different table structures in the database which is difficult to maintain. You also need to write code in all places which first checks whether it is dealing with Atom or RSS feed. This makes your application complicated. Such a problem is solved by XML. In XML, you can store different types of data, yet fully structured. Using XSL, you can easily render different XML structures the same way. So, it does not matter whether you have Atom or RSS content inside your XML, an XSL can easily check the type and produce same HTML output for viewing.
First let’s see the generic feed parser that I have made. The FeedProcessor.cs is the generic feed parser which can parse RSS/Atom/RDF the same way and produce Channel
and RSSFeed
objects in one generic format.
While parsing, first it sees whether the XML contains RSS or Atom or RDF:
public IList Parse( XmlReader reader )
{
IList channels = new ArrayList();
while( reader.Read() )
{
if( reader.NodeType == XmlNodeType.Element )
{
string name = reader.Name.ToLower();
switch( name )
{
case "atom:feed":
case "feed":
channels.Add( this.ProcessAtomFeed(reader));
break;
case "rdf:rdf":
case "rdf":
case "rss:rss":
case "rss":
channels.Add( this.ProcessRssFeed(reader));
break;
}
}
}
return channels;
}
For Atom feed, it calls the ProcessAtomFeed
which parses the channel properties only:
private RssChannel ProcessAtomFeed( XmlReader reader )
{
RssChannel channel = new RssChannel();
channel.Type = RssTypeEnum.Atom;
channel.Feeds = new ArrayList();
while( reader.Read() )
{
if( reader.NodeType == XmlNodeType.Element )
{
string name = reader.Name;
switch( name )
{
case "title":
channel.Title = ReadString( reader );
break;
case "link":
reader.MoveToAttribute("href");
if( reader.ReadAttributeValue() )
{
channel.Link = reader.Value;
}
break;
case "tagline":
channel.Description = ReadString( reader );
break;
case "description":
channel.Description = ReadString( reader );
break;
case "entry":
channel.Feeds.Add(this.ProcessAtomEntry(reader));
break;
}
}
else if( reader.NodeType == XmlNodeType.EndElement )
{
if( reader.Name == "feed" )
break;
}
}
return channel;
}
Similarly ProcessRssFeed
function does the same job.
The complex part is parsing the entry
or item
node which actually contains the XML content. We need to parse it in two ways:
- We need to discover some essential properties like publish date, title and GUID according to what we are parsing (Arom/RSS).
- We need to store the everything we are reading in a buffer because
XmlReader
is a one way reader and we cannot go back once we have forwarded.
So, we not only need to use an XmlReader
to read, but also an XmlWriter
to write the same thing as we read in a temporary buffer.
The next design complexity is writing a generic function which parses the entry
and item
node in the same way. Although we can make two different functions for parsing entry
and item
nodes, the functions will have 90% code duplicated; the only differences are the name of some nodes and some structural differences. So, here is the function which parses it all:
private RssFeed ProcessFeedNode( XmlReader reader,
string itemNodeName, string titleNodeName,
string guidNodeName, string linkNodeName,
string pubDateNodeName )
{
RssFeed feed = new RssFeed();
StringBuilder buffer = new StringBuilder(1024);
XmlTextWriter writer =
new XmlTextWriter(new StringWriter(buffer));
writer.Namespaces = false;
writer.Indentation = 1;
writer.IndentChar = '\t';
writer.Formatting = Formatting.Indented;
writer.WriteStartElement(itemNodeName);
string lastNode = reader.Name;
while( (reader.NodeType == XmlNodeType.Element
&& lastNode != reader.Name) || reader.Read() )
{
if( reader.NodeType == XmlNodeType.Element )
{
lastNode = reader.Name;
writer.WriteStartElement( reader.Name );
writer.WriteAttributes( reader, true );
if( reader.Name == titleNodeName )
{
feed.Title = ReadString( reader );
writer.WriteString(feed.Title);
}
else if( reader.Name == guidNodeName )
{
feed.Guid = ReadString( reader );
writer.WriteString(feed.Guid);
}
else if( reader.Name == linkNodeName )
{
string link = reader.GetAttribute("href", "");
if( null == link )
{
link = ReadString( reader );
writer.WriteString( link );
}
if( feed.Guid == null )
{
feed.Guid = link;
}
}
else if( reader.Name == pubDateNodeName )
{
string date = ReadString( reader );
feed.PublishDate = this.FormatDate( date );
writer.WriteString(date);
}
else
{
writer.WriteRaw( reader.ReadInnerXml() );
}
writer.WriteEndElement();
if( reader.NodeType == XmlNodeType.EndElement )
{
if( reader.Name == itemNodeName ) break;
reader.ReadEndElement();
}
}
if( reader.NodeType == XmlNodeType.EndElement )
{
if( reader.Name == itemNodeName )
break;
}
}
writer.WriteEndElement();
writer.Close();
feed.XML = buffer.ToString();
return feed;
}
Although this function is not optimal, we can optimize it in many ways. But it does the job pretty well. It parses a 200 KB feed in a fraction of a second without even occupying 5% of the CPU.
You will see in the above code that I have used a custom function called ReadString
instead of using XmlReader
’s ReadString
method. The documentation says ReadString
method is supposed to read the content of the string. It is not supposed to jump off the end tag. But in practice, it does go over the end tag and stops at the next begin tag. So, if you are reading the <title>
node, and call ReadString
, the next node you will get is the <pubDate>
node, not the </title>
. But we need to know when a tag is closed so that we can close the tag in the XmlWriter
also. This is why I have made the custom ReadString
method:
private string ReadString( XmlReader reader )
{
buffer.Length = 0;
if( reader.IsEmptyElement ) return string.Empty;
while(!reader.EOF
&& ( reader.NodeType == XmlNodeType.Element
|| reader.NodeType == XmlNodeType.Whitespace ) )
reader.Read();
while( reader.NodeType == XmlNodeType.CDATA
|| reader.NodeType == XmlNodeType.Text
&& reader.NodeType != XmlNodeType.EndElement )
{
buffer.Append( reader.Value );
reader.Read();
}
return buffer.ToString();
}
The best way to make this application simple and think only about RSS is to convert Atom XML to RSS XML just after downloading the content from web source. This way, the whole application can deal with the RSS feed and need not worry about all other formats. So, in future, if another format gets popular, all I need to do is write another converter which converts that format to RSS format. The application receives no significant changes.
In the source code, you will find atomtorss2.xslt which converts Atom 0.3 XML to RSS 2.0 XML. Here is a little excerpt of the XSLT which does the conversion:
<xsl:template name="items">
<xsl:for-each select="atom:entry">
<item>
<title><xsl:value-of select="atom:title"/></title>
<link>
<xsl:value-of select="atom:link[@rel='alternate']/@href"/>
</link>
<guid><xsl:value-of select="atom:id" /></guid>
<description>
<xsl:value-of select="atom:content" />
</description>
<pubDate>
<xsl:choose>
<xsl:when test='atom:issued'>
<xsl:value-of select="date:format-date(atom:issued,'EEE,
dd MMM yyyy hh:mm:ss z')"/>
</xsl:when>
<xsl:when test='atom:modified'>
<xsl:value-of select="date:format-date(atom:modified,'EEE,
dd MMM yyyy hh:mm:ss z')"/>
</xsl:when>
</xsl:choose>
</pubDate>
</item>
</xsl:for-each>
</xsl:template>
Now, here you will notice that, there’s a function format-date
inside the select
attribute. This function is not available in XSLT processor. So, how do we do this?
EXSLT is taking XSLT processing to the next level. It supports a rich collection of functions that you can use in XSLT scripts which makes XSL a truly powerful script for real world XML transformation. The beauty of XsltTransformer
is that, it allows you to write pure .NET functions which are invoked whenever they are called from XSLT script. Using this feature, you can make complicated XML transformation which fully utilizes the full power of .NET platform. You can even write functions that can call a database and get dynamic values and put that value in the resulting XML.
Using EXSLT
is very easy. The following code shows how an Atom 0.3 XML is converted to RSS 2.0 XML:
void EXSLT()
{
ExsltTransform xslt = new ExsltTransform();
xslt.SupportedFunctions = ExsltFunctionNamespace.All;
xslt.MultiOutput = false;
xslt.Load("atomtorss2.xslt");
xslt.Transform("atom.xml", "rss.xml");
}
Although we all know it, I always forget to do the right thing to serialize ArrayList
which contains custom objects. So, I have made a convenient SerializationHelper
class which exposes variants of Serialize
or Deserialize
functions which you can use in your applications:
public static XmlWriter Serialize( Stream stream, object o )
{
XmlTextWriter writer = new XmlTextWriter( stream,
System.Text.Encoding.UTF8 );
XmlSerializer serializer =
new XmlSerializer( o.GetType() );
serializer.Serialize( writer, o );
return writer;
}
public static XmlWriter Serialize( Stream stream,
ArrayList array, Type type )
{
XmlTextWriter writer = new XmlTextWriter( stream,
System.Text.Encoding.UTF8 );
XmlSerializer serializer = new
XmlSerializer(typeof(ArrayList), new Type[] {type});
serializer.Serialize( writer, array );
return writer;
}
public static object Deserialize( Stream stream, Type t )
{
XmlTextReader reader = new XmlTextReader( stream );
XmlSerializer serializer = new XmlSerializer( t );
object o = serializer.Deserialize( reader );
return o;
}
public static ArrayList DeserializeArraylist(Stream stream,
Type t)
{
XmlTextReader reader = new XmlTextReader( stream );
XmlSerializer serializer = new
XmlSerializer( typeof( ArrayList ), new Type [] { t } );
ArrayList list =
(ArrayList) serializer.Deserialize( reader );
return list;
}
In order to serialize an ArrayList
which contains objects of type RSSFeed
, you issue the following command:
Serialize( stream, arrayList, typeof( RSSFeed ) );
If you have a property of type ArrayList
, the serialized XML looks very ugly and does not look like what we call – strongly typed if that applies to XML. In order to customize how an ArrayList
is serialized, you can try the two attributes XmlArray
and XmlArrayItem
:
private ArrayList _WebLogs = new ArrayList();
[XmlArray("weblogs"), XmlArrayItem("weblog", typeof(WebLog)) ]
public ArrayList WebLogs
{
get { return _WebLogs; }
set { _WebLogs = value; }
}
This way you will get a nice XML output.
The easiest way to carry additional files with your projects is not to carry them as external files, instead putting them inside an assembly. Visual Studio has a nice property named Build Action for this purpose:
If you set the property to Embedded Resource, that file is linked inside the assembly. This means that the entire content of the file is embedded inside the assembly as inline resource. As a result, you can make one assembly which contains all the files you need. You can directly read the files from the assembly as a Stream
without opening as a file. Each file is embedded inside the assembly by the following naming convention:
NameSpace.FileName.Extention
So, for the file atomtorss2.xslt, the full name of the embedded resource is:
RSSFeederResources.atomtorss2.xslt
The best thing about embedded resources is that they are not in the file system as separate physical files. So, you never need to worry about the path of the files. You can directly read the content of the file as a Stream
. However, the embedded resource is read-only. If you want to modify the content, then there is no way known to me so far to do that. In that, case you will have to create a file from embedded resource and use that file always.
The SerializationHelper
has some convenient functions to deal with embedded assemblies. For example:
public static Assembly GetResourceAssembly()
{
return Assembly.LoadFrom("RSSFeederResources.dll");
}
public static Stream GetStream( string name )
{
return GetResourceAssembly().GetManifestResourceStream(
RESOURCE_ASSEMBLY_PREFIX + name);
}
GetStream
returns an embedded resource as a Stream
.
public static System.Drawing.Icon GetIcon( string name )
{
using( Stream stream = GetStream( name ) )
{
return new System.Drawing.Icon( stream );
}
}
You get an Icon
object directly from an embedded icon file. This is very convenient for carrying all the icons that your application uses. You don’t need to maintain separate icon files and worry about their path.
public static void WriteEmbeddedFile( string name, string fileName )
{
using( Stream stream = GetStream( name ) )
{
FileInfo file = new FileInfo( fileName );
using( FileStream fileStream = file.Create() )
{
byte [] buf = new byte[ 1024 ];
int size;
while( (size = stream.Read( buf, 0, 1024 )) > 0 )
{
fileStream.Write( buf, 0, size );
}
}
}
}
The WriteEmbeddedFile
function generates a file from the content of the embedded resource.
The best place to store application specific data is not in the folder where your program is installed but in the “Application Data” folder that Windows® creates for each user. There are actually two Application Data folders. One is directly under the user folder and another is inside the hidden folder "Local Settings". So, if you user name is Omar AL Zabir, the path to these folders will be:
C:\Documents and Settings\Omar Al Zabir\Application Data
And the secret one is:
C:\Documents and Settings\Omar Al Zabir\Local Settings\Application Data
The major difference between them is that the first one is visible. You can browse to that folder using Explorer. But the second one is hidden by default. You need to turn on “Show Hidden Files” from Explorer Options in order to see that folder.
Another difference is that, when you have a roaming profile in Windows®, the first visible folder is synchronized to the network store. So, whatever you store in that folder, you can easily access it when you login from another computer inside the domain. This location is better for storing files than the location where your application .exe is located because that does not roam. Remember, the second folder is computer specific and it is not synchronized when you move from one computer to another.
RSS Feeder stores all the data in the second folder.
You can get the path to these special folders using the Environment.GetFolderPath
function. It takes an enumeration as shown in the following code snippet:
enum System.Environment.SpecialFolder
{
ApplicationData
CommonApplicationData
LocalApplicationData
...
}
There are many other useful folders like the Desktop, Program Files, My Documents etc. All these are made available from this enum
. Here’s how you use it:
string appDataPath = Environment.GetFolderPath(
Environment.SpecialFolder.LocalApplicationData);
ApplicationSettings.ApplicationDataPath =
Path.Combine( appDataPath, "RSS Feeder" );
if( !Directory.Exists(ApplicationSettings.ApplicationDataPath))
{
Directory.CreateDirectory(ApplicationSettings.ApplicationDataPath);
}
This way you can setup your own folder for storing all the application specific files.
OPML is an XML-based format that allows exchange of outline-structured information between applications running on different operating systems and environments. OPML is used to store information about RSS Feed sources. For example, a blog site uses OPML to store all the blog titles and feed locations that it contains. If you go to blogs.msdn.com you will get the OPML for all the Microsoft bloggers’ feed URL and title.
<opml>
<body>
<outline text="Microsoft Bloggers">
<outline title="Alex Lowe's .NET Blog"
htmlUrl=http://blogs.msdn.com/alowe/default.aspx
xmlUrl="http://blogs.msdn.com/alowe/rss.aspx" />
<outline title="Michał Cierniak"
htmlUrl=http://blogs.msdn.com/michaljc/default.aspx
xmlUrl="http://blogs.msdn.com/michaljc/rss.aspx" />
...
...
The above XML is an excerpt of OPML from blogs.msdn.com.
RSS aggregators use OPML to exchange subscription information among other aggregators. For example, you can export all your subscriptions as OPML from Newsgator and then import the OPML to my RSS Feeder. In fact you don’t need to do it at all. Whenever you run it, it will import the Newsgator settings. Beware, Newsgator's XML contains "xmlurl" but all other OPML use "xmlUrl". The difference is in the case of "U".
Newsgator stores its subscription information in an OPML. Its OPML is a bit different:
="1.0"="utf-8"
<opml xmlns:ng="http://newsgator.com/schema/opml">
<body>
<outline title="NewsGator News and Updates"
description="NewsGator News and Updates"
lastItemMD5="eFY8dWcOAPZBbpG7Ha1l5g==,XqCFJLx/DJigD7YRFnD3OA==,..."
xmlurl=http://www.newsgator.com/news/rss.aspx
htmlurl="http://www.newsgator.com"
ng:folderName="" ng:folderType="auto"
ng:useDefaultCredentials="false"
ng:username="" ng:passwordenc=""
ng:domain="" ng:useGuid="false"
ng:interval="0" ng:nntpMostRecent="-1"
ng:newsPage="true" ng:renderTransform=""
ng:downloadAttachments="false" />
</body>
</opml>
It uses additional properties like ng:folderName
and ng:folderType
to describe Outlook Folder location where subscriptions are mapped. RSS Feeder provides Newsgator import feature. It uses this information to map to the same folder as Newsgator does.
The OpmlHelper
class in the source code provides OPML parsing and generation functionality.
As RSS Feeder uses an MS Access database, it does not have much problem with multiple clients connecting to database or extra design issues that we need to consider while using SQL Server. However, we do need to optimize connection open and close as MS Access takes pretty long time to open and close connections.
So, what I have done here is open a static connection when the app starts and use that connection throughout the application. When the application closes, it closes the open connection. This gives significant performance boost than opening and closing connection whenever we access the database.
However, static connection object leads to multithreading issues. Sometimes two threads can try to execute commands on the same connection at the same time. For example, imagine Feed Downloader is downloading feeds in the background and you are reading the feeds. Now both you and the feed downloader are trying to read/write feeds from the database at the same time. At least one will fail as the connection will be in Executing
state instead of Open
state. In order to prevent this, I have implemented a Thread.Sleep
whenever I see the connection is in Executing
state.
private static OleDbConnection __Connection = null;
private static OleDbConnection _Connection
{
get
{
if( null == __Connection )
{
string connectionString = string.Format
( "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};",
ApplicationSettings.DatabaseFilePath );
__Connection = new OleDbConnection( connectionString );
__Connection.Open();
}
else
{
while( ConnectionState.Executing == __Connection.State
|| ConnectionState.Fetching == __Connection.State )
{
System.Threading.Thread.Sleep( 50 );
}
if( ConnectionState.Open != __Connection.State )
__Connection.Open();
}
return __Connection;
}
}
public static void Close()
{
if( null != __Connection )
if( ConnectionState.Closed != __Connection.State )
{
__Connection.Close();
__Connection.Dispose();
}
}
So, when a thread tries to get hold of a connection object while the connection is executing, the thread goes into a 50 ms sleep. This is pretty much enough time to let the execution complete.
In a normal single processor computer, you will rarely face this problem. But if you have a P4 Hyper Threading Processor which simulates two processors, or your own HP6000 with six Xeon Processors in your house, then you will see this problem frequently.
This is the most complicated part. The first problem you will face while automating Outlook is releasing all the object references used in your code. If you do not release the references properly, Outlook will never close even if it disappears from the screen. The second problem is making a version independent solution. If you “Add Reference...” to Outlook COM library from Visual Studio and set reference to a specific version of DLL, the interop assembly that is generated becomes a version specific assembly. As a result, it will not work properly with older versions of Outlook. The only way you can make it version independent is to develop against the oldest version of Outlook you want to support and make builds from that version.
Another solution to make a truly version independent Outlook automation is to use Late Binding. You can use the Activator
class in .NET framework to instantiate any COM object using ProgID. So, you can launch Outlook using late binding in the following way:
public static void StartOutlook( ref object app, ref object name )
{
try
{
app = Marshal.GetActiveObject("Outlook.Application");
name = GetProperty( app, "Session" );
}
catch
{
app = GetObject("Outlook.Application");
name = GetProperty( app, "Session" );
}
}
public static object GetObject( string typeName )
{
Type objectType = Type.GetTypeFromProgID(typeName);
return Activator.CreateInstance( objectType );
}
public static object GetProperty( object obj,
string propertyName )
{
return obj.GetType().InvokeMember( propertyName,
BindingFlags.GetProperty, null, obj, null );
}
Although this makes coding a nightmare, but you can make helper functions to deal with the objects and properties. The OutlookHelper
class provided with the source code gives you many helper functions to automate Outlook. However, you still have to memorize the object model of the Outlook first before writing code using Late Bound approach. In order to have a safe late bound approach to COM interop and a version independent yet strongly typed Outlook Automation Library, check my article SafeCOMWrapper. If you use this, you don’t have to write the cumbersome code and you don’t need to remember the object model.
Here’s a late bound way to show Folder Picker dialog box from the Outlook:
public static string SelectFolderPath()
{
object app = null, name = null;
StartOutlook( ref app, ref name );
bool isOutlookInvisible = EnsureExplorer( app, name );
object folder = CallMethod( name, "PickFolder" );
string path = null;
if( null != folder )
path = GetProperty( folder, "FolderPath" ) as string;
if( null != folder )
Marshal.ReleaseComObject( folder );
folder = null;
Marshal.ReleaseComObject( name );
name = null;
if( isOutlookInvisible )
CloseOutlook(app);
Marshal.ReleaseComObject( app );
app = null;
GC.Collect();
return path;
}
Whenever the application downloads RSS feeds, it also sends them to a mapped Outlook folder. First it transforms XML of an entry to HTML using XSLT transformation described in this article. Then it uses a UserProperty
to store the actual XML against the post. This way, you can write macros later on to work on the XML of the original entries.
foreach( RssFeed item in rssItems )
{
object post = OutlookHelper.CallMethod( folderItems,
"Add", 6 );
OutlookHelper.SetProperty( post, "Subject",
item.Title );
stream.Position = 0;
stream.SetLength(0);
XMLHelper.TransformXml( xsltFileName,
item.XML, stream );
stream.Position = 0;
string html = reader.ReadToEnd();
OutlookHelper.SetProperty( post,
"HTMLBody", html );
OutlookHelper.SetProperty( post,
"BodyFormat", 2 );
object userProperties =
OutlookHelper.GetProperty( post, "UserProperties" );
object missing = System.Reflection.Missing.Value;
object xmlProperty =
OutlookHelper.CallMethod( userProperties, "Add", "XML",
1, missing, missing );
Marshal.ReleaseComObject( userProperties );
OutlookHelper.SetProperty( xmlProperty,
"Value", item.XML );
OutlookHelper.SetProperty( post, "UnRead", 1 );
OutlookHelper.CallMethod( post, "Post" );
Marshal.ReleaseComObject( post );
itemsAdded ++;
}
RSS Feeder provides blogging directly from Outlook. First you create a “Mail and Post” type folder. Then you create Posts in that folder. You can also drag posts from other folders to this folder. RSS Feeder picks up the posts available in the folder and then publishes it to the weblog site.
The following code shows how this is done the hard way which is the late bound way:
object folderItems = OutlookHelper.GetProperty( folder, "Items" );
int index = 1;
while(index <=
(int)OutlookHelper.GetProperty(folderItems,
"Count"))
{
object item =
OutlookHelper.GetItem( folderItems, index );
string messageClass =
(string)OutlookHelper.GetProperty( item,
"MessageClass" );
if( "IPM.Post" == messageClass )
{
string subject =
(string)OutlookHelper.GetProperty(item,
"Subject");
string categories =
(string)OutlookHelper.GetProperty( item,
"Categories", "" );
string html;
try
{
html =
(string)OutlookHelper.GetProperty( item,
"HTMLBody");
}
catch
{
...
MessageBox.Show( this,
"You did not allow me to read" +
"the post from Outlook." +
"Please allow it next time.",
"Outlook Error", MessageBoxButtons.OK,
MessageBoxIcon.Exclamation );
...
...
...
continue;
}
Post p = new Post();
p.Title = subject;
p.Text = html;
p.Date = DateTime.Now;
p.Categories =
webLog.GetCategories( categories );
try
{
WebLogProvider.Instance.PostBlog(webLog, p);
OutlookHelper.CallMethod( item,
"Move", sentFolder );
}
catch( Exception x )
{
...
}
}
Marshal.ReleaseComObject( item );
}
Although there is nothing special in this area, but I can show you some handy tips. For example, I have made a handy function which downloads content from a URL and gives progress update as the download continues. As a result, you can make a progress dialog box which can show the downloaded content size and the connection speed while the download continues.
The following function handles the common download scenario with proxy support:
private static void DownloadContent( Uri uri,
string proxyName, int port,
string proxyUserName, string proxyPassword,
ProgressEventHandler progressHandler,
MemoryStream memoryStream )
{
WebRequest webRequest = HttpWebRequest.Create(uri);
if( null != proxyName && proxyName.Length > 0 )
{
webRequest.Proxy = new System.Net.WebProxy( proxyName, port );
if( proxyUserName.Length > 0 )
{
string password = RSSCommon.PropertyEditor.PasswordEditor.
PasswordProvider.Decrypt(proxyPassword);
webRequest.Proxy.Credentials =
new System.Net.NetworkCredential( proxyUserName, password );
}
}
else
webRequest.Proxy = System.Net.WebProxy.GetDefaultProxy();
DateTime startTime = DateTime.Now;
using(System.Net.WebResponse webResponse =
webRequest.GetResponse())
{
using(System.IO.Stream stream =
webResponse.GetResponseStream())
{
byte [] buffer = new byte[ 1024 * 2 ];
int size;
while( (size = stream.Read( buffer, 0,
buffer.Length) ) > 0 )
{
memoryStream.Write( buffer, 0, size );
TimeSpan duration = DateTime.Now - startTime;
double kb = (memoryStream.Length / 1024.0);
double speed = kb / duration.TotalSeconds;
string message = string.Format( "{0} kb, {1} kbps",
kb.ToString("n2"), speed.ToString("n2") );
progressHandler( null,
new ProgressEventArgs( message , 0 ) );
}
}
}
}
The above function takes a delegate of type ProgressEventHandler
which is defined as follows:
public delegate void ProgressEventHandler( object sender, ProgressEventArgs e );
The ProgressEventArgs
extends .NET framework’s EventArgs
:
[Serializable]
public class ProgressEventArgs : EventArgs
{
public string Message;
public int Value;
public ProgressEventArgs(string message, int value)
{
this.Message = message;
this.Value = value;
}
}
This is the standard way to make Events. I have seen people making custom delegates and taking all the required information as parameters. That’s not the prescribed way. The suggested way is to make delegates similar to EventHandler
delegate and make your own class extend the EventArgs
which contains all the event parameters. This makes the design consistent and other developers can pick up the concept easily as it is something closer to what they already use.
One of the common mistakes we all do is that we try to update the UI from background threads or self created threads than using the main UI thread and thus make our apps crash frequently. While designing multithreaded applications, you need to ensure that you are in no way using any resource on the UI from any other thread other than the main thread. Normally the common mistake we all make is, we breakup the code of another thread in different functions and on those functions we forget about the fact that it is running in another thread. As a result, the app crashes frequently whenever we try to access the UI elements.
In order to update UI from a function called from or running in another thread, you need to use the Invoke
or BeginInvoke
method of the Control
or Form
. Here is my favorite way to do this:
private void UpdateWebLogProgress( object sender,
ProgressEventArgs e )
{
if( base.Disposing ) return;
if( base.InvokeRequired )
{
this.Invoke( new ProgressEventHandler(
this.UpdateWebLogProgress ),
new object[] { sender, e } );
return
}
}
The function which accesses UI element checks itself whether it requires invoke. If it requires invoke, it calls itself using a delegate
via the Invoke
method. If it does not require invoke, it executes the code which handles the UI elements. Thus it guarantees to prevent the problem of accessing UI elements from another thread. It also saves code because you need not Invoke
it explicitly from all the places.
this.Invoke( new ProgressEventHandler(this.UpdateWebLogProgress),
new object[] {this, new ProgressEventArgs("Complete", 100)});
Instead you need to do the following from any thread:
UpdateWebLogProgress( this, new ProgressEventArgs( "Complete", 100 ) );
Thus it frees the caller from remembering the function that works with UI elements.
An interesting feature of RSS Aggregators is to detect the presence of RSS feed or RSS Feed location from any URL. There are different scenarios you have to consider while implementing auto discovery:
- The URL can itself be an RSS Feed, not an HTML.
- The URL can be a HTML page having
<link rel="alternate" type="application/rss+xml" …>
which refers to a RSS source. - The HTML page may contain hyperlinks to RSS Feed sources.
RSS Feeder can intelligently identify what a URL contains and take actions accordingly. For example if you provide the URL of an HTML page, it will find all the RSS feed sources specified inside the HTML. So, if you provide the URL msdn.microsoft.com it will automatically detect all the RSS Channels specified in that page.
On the other hand, if you give it a URL containing some type of feed, it will automatically detect the type of feed.
The HTML processing is done by the famous SgmlReader
. Here’s is the code which discovers the type of content that is found after downloading data from a URL:
private static RssTypeEnum Discover( MemoryStream memoryStream,
ref IList channelSources, ProgressEventHandler progressHandler )
{
Sgml.SgmlReader reader = new Sgml.SgmlReader();
bool isHtml = false;
RssTypeEnum feedType = RssTypeEnum.Unknown;
memoryStream.Position = 0;
reader.InputStream = new StreamReader( memoryStream );
try
{
while( reader.Read() )
{
if( null != reader.Name )
{
string name = reader.Name.Trim().ToLower();
if( name == "html" || name == "body" )
{
isHtml = true;
feedType = RssTypeEnum.Unknown;
}
else if( (name == "rss" || name == "rdf:rdf")
&& !isHtml )
{
feedType = RssTypeEnum.RSS;
break;
}
else if( name == "feed" && !isHtml )
{
feedType = RssTypeEnum.Atom;
break;
}
else if( isHtml && name == "link" )
{
string rel = reader.GetAttribute("rel");
string type = reader.GetAttribute("type");
string title = reader.GetAttribute("title");
string href = reader.GetAttribute("href");
if( rel.ToLower() == "alternate"
&& type.ToLower() == "application/rss+xml")
{
channelSources.Add(new string [] { title, href });
}
if( rel.ToLower() == "alternate"
&& type.ToLower() == "application/atom+xml")
{
channelSources.Add(new string [] { title, href });
}
}
}
}
}
catch( Exception x )
{
EntLibHelper.Exception( "RSS Discovery", x );
feedType = RssTypeEnum.RSS;
}
return feedType;
}
The code reads the node one by one until it discovers either html
, rss
, rdf
or feed
node. If it reaches the html
node, it assumes that this is an HTML document and looks for the link
node which specifies the reference to RSS or Atom feed source. If it finds such sources, it makes a collection of those sources and returns the collection.
There are two types of XSL transformation of RSS XML in this program. One is to render a single post and another one is to render a newspaper which contains multiple channels and multiple feeds.
Believe it or not, you cannot transform an XML to HTML properly using XSL if it contains embedded HTML content inside the XML nodes. .NET framework’s built in XSL transformer cannot decode the HTML content inside the nodes. Although the XSL specification says you can do this:
<xsl:value-of disable-output-escaping="yes" select="description" />
This is used to prevent XSL transformer from escaping the content of the description node and to render as it is, but it does not work. The XSL transformer is going to encode the HTML content no matter what happens. As a result, instead of getting a nicely formatted HTML output, you see the HTML codes.
Due to this problem, I created the HtmlReader
and HtmlWriter
classes. Details about this can be found in my HTML Cleaner article.
HtmlWriter
extends XmlWriter
and prevents escaping of text content inside the tags. It overrides the WriteString
method and invokes WriteRaw
from it.
public override void WriteString(string text)
{
text = text.Replace( " ", " " );
text = text.Replace("<![CDATA[","");
text = text.Replace("]]>", "");
if( this.FilterOutput )
{
if(this.ReduceConsecutiveSpace) text =
text.Replace(" ", " ");
if(this.RemoveNewlines) text =
text.Replace(Environment.NewLine, " ");
base.WriteRaw( text );
}
else
{
base.WriteRaw( text );
}
}
This class eliminates the need for having the disable-output-escaping
attribute set.
This is trickier than rendering a single post as you have to render multiple channels in one HTML. The process is as follows:
- Combine all the channels into one XML which contains one RSS node but multiple channel nodes.
- Transform the combined XML using XSLT in order to render HTML.
The first problem I encountered was combining multiple XML using XmlWriter
. XmlWriter
’s WriteRaw
method is supposed to write everything I give it without doing any encoding. But it does not do that, instead it encodes any XML given as a node’s value. As a result, I cannot write XML of an RSS entry using XmlWriter
. In order to do this, I need to first create a StreamWriter
and then wrap that using XmlTextWriter
. Regular XML construction is done by the XmlTextWriter
, but whenever raw XML needs to be written, I call the StreamWriter
’s WriteLine
method.
So, the code for generating one channel is like this:
writer.WriteStartElement("channel");
writer.WriteElementString( "title",
channel.Title );
writer.WriteElementString( "link",
channel.FeedURL.ToString() );
IList items =
DatabaseHelper.GetTopRssItems(channel.Id,
(int)itemCountUpDown.Value, false);
foreach( RssFeed feed in items )
{
streamWriter.WriteLine( feed.XML );
}
writer.WriteEndElement();
This is a common requirement for many desktop applications where you need only one instance of your app to be allowed to run.
Doing this requires a bit of Win32 level knowledge. When an app is created, it registers a Mutex
. But before creating it first checks whether there is any Mutex
already registered with the same name. If there is, then another instance of application is already running and so we quit.
public static bool IsAlreadyRunning()
{
string strLoc =
Assembly.GetExecutingAssembly().Location;
FileSystemInfo fileInfo = new FileInfo(strLoc);
string sExeName = fileInfo.Name;
bool bCreatedNew;
mutex = new Mutex(true,
"Global\\"+sExeName, out bCreatedNew);
if (bCreatedNew)
mutex.ReleaseMutex();
return !bCreatedNew;
}
You will find the solution in this article:
Suppose you have a list of items. Every time a user selects an item, you need to perform some database operation and generate HTML which takes time. You have written this code in the SelectedIndexChanged
event of the list. So if the user selects an item by clicking it and then holds down the DOWN arrow for a while, the event is going to be fired repeatedly which results in frequent database calls and HTML rendering. As those operations take time, the selection pointer does not smoothly go down, instead it gets stuck on every item for a while making the whole UI unresponsive until all the method executions are finished.
Another scenario is, if the user first clicks an item and then scrolls down using a mouse and holds SHIFT and clicks another item to make multiple selections, the SelectedIndexChanged
event of the list is going to be fired for each and every item that falls within the selection range. As a result, the user is going to be stuck for a while until all the events are fired and the code execution for all the events is finished.
In order to solve these problems, we will use a timer to invoke the method which does the actual work instead of calling it directly. The process is as follows:
- On
SelectedIndexChanged
event of the list, we first stop the timer in order to initialize its state. Then we start the timer with an interval. - We set a delegate to the method which needs to be called.
- On the
Elapsed
event, we call the delegate, stop the timer and clear the delegate.
As a result, no matter how many times the SelectedIndexChanged
event of the list is fired, it just sets a timer which in turn queues a method invoke. As we use a delegate to point to a method instead of an event, the delegate is called only once which in turn calls the actual method just once.
Here’s how we set the timer:
this._CallbackMethod =
new MethodInvoker( this.AutoSelectChannel );
this.callbackTimer.Interval =
STARTUP_SHOW_CHANNEL_DURATION;
this.callbackTimer.Start();
On the Elapsed
event, we have this code:
private void callbackTimer_Elapsed(object sender,
System.Timers.ElapsedEventArgs e)
{
callbackTimer.Stop();
if( null != this._CallbackMethod )
this._CallbackMethod();
this._CallbackMethod = null;
}
.NET Winforms applications are memory hog apps. An innocent one button, one form application takes about 15 to 20 MB memory when it loads, whereas a VB 6 counter part takes just 15 to 20 KB.
However, if you open Task Manager while RSS Feeder is in the tray and no window is visible, you will notice that it takes only around 3 to 5 MB memory.
This is done by reducing the WorkingSet
of the Process
of the app. WorkingSet
is a property of the Process
class which contains how much memory the process is consuming. The interesting thing is that, it is not a read-only variable; you can increase or decrease it. If you try to decrease it, you will notice the memory usage goes down significantly. Although it does not work always, we can give it a try whenever we want it.
public static void ReduceMemory()
{
try
{
System.Diagnostics.Process loProcess =
System.Diagnostics.Process.GetCurrentProcess();
if(m_TipAction == true)
{
loProcess.MaxWorkingSet =
(IntPtr)((int)loProcess.MaxWorkingSet - 1);
loProcess.MinWorkingSet =
(IntPtr)((int)loProcess.MinWorkingSet - 1);
}
else
{
loProcess.MaxWorkingSet =
(IntPtr)((int)loProcess.MaxWorkingSet + 1);
loProcess.MinWorkingSet =
(IntPtr)((int)loProcess.MinWorkingSet + 1);
}
m_TipAction = !m_TipAction;
}
catch( Exception x )
{
System.Diagnostics.Debug.WriteLine( x );
}
}
In the source code, you will find a MemoryHelper
class which does this work. You can call the ReduceMemory
method whenever your app is deactivated or minimized leaving some memory for others to use.
We all know how to do it, add a key in the registry and that’s all. But putting an application at startup, specially a .NET application at startup is a serious pain for Windows. Normally if you try to load a Winforms application for the first time, you will see how slow it is and how much hard disk activity it consumes. Putting such applications at startup when Windows is doing a lot of other tasks is going to make the boot-up much slower and make the user lose his patience.
That’s why, I have made an application loader in VB 6 which takes only 20 KB and loads in the blink of an eye. When the RSS Feeder is set to load at startup by the user, it actually puts the loader at startup with a 5 minutes delay. The loader immediately goes to sleep for 5 minutes after loading, allowing Windows to breathe freely for a while and wipe out sweats after heavy boot-up pushups. When everything is nice and quiet, it starts the jumbo .NET app.
Here’s the VB 6 code of the loader:
Private Declare Sub Sleep Lib "kernel32" (_
ByVal dwMilliseconds As Long)
Sub Main()
On Error GoTo ExitDoor
Dim strArguments() As String
strArguments = Split(Command$, " ")
Dim lngDelay As Long
lngDelay = Val(strArguments(0))
Dim strFile As String
strFile = Trim(strArguments(1))
Call Sleep(lngDelay * 1000)
Shell strFile
ExitDoor:
End Sub
RSS Feeder normally stays at the system tray as an icon. So, we need to use Window 2000/XP/2003’s Balloon notification feature in order to provide information about its activities.
This is done by the wonderful article in CodeProject.
As the RSS Feeder runs all over the world without letting me know anything about it, I need a way to know whether my users are happily using it or not. So, I have implemented a silent crash report module which traps any unhandled exceptions and reports back to me via Source Forge tracking system.
Error reporting works in the following way:
- Whenever an unhandled exception occurs, the error report is queued to an Error Queue.
- The queue is frequently monitored by an Error Reporting Engine. Whenever it picks up an error from the queue, it launches a background thread.
- The background thread prepares an error report collecting some context information like the current login name of the user which helps to identify duplicate reports from the same user.
- The message is posted as HTTP POST to Source Forge Tracker System.
The error reporting engine uses a timer to periodically check the queue.
private void errorReportTimer_Tick(object sender,
System.EventArgs e)
{
if( base.Disposing ) return;
if( 0 == _ErrorReports.Count )
{
this.errorReportTimer.Stop();
_Instance = null;
base.Close();
}
else
{
if( !this._IsErrorBeingSent )
{
this._IsErrorBeingSent = true;
base.Show();
base.Refresh();
ErrorReportItem item =
_ErrorReports[0] as ErrorReportItem;
_Instance.errorReportTextBox.Text = item.Details;
ThreadPool.QueueUserWorkItem( new
WaitCallback( SendQueuedReport ), item );
}
}
}
The error report is sent from a background thread so that the UI remains responsive.
private void SendQueuedReport(object state)
{
if( base.Disposing ) return;
ErrorReportItem item = state as ErrorReportItem;
try
{
ErrorReportHelper.PostTrackerItem(item.Summary,
item.Details);
}
catch
{
}
lock( _ErrorReports )
{
_ErrorReports.Remove( state );
}
this._IsErrorBeingSent = false;
}
The actual marvel is in the ErrorReportHelper
class which performs the task of making an HTTP POST to the Source forge Tracker System.
public static void HttpPost( string url,
string referer, params string [] formVariables )
{
HttpWebRequest req = null;
try
{
req = (HttpWebRequest)
HttpWebRequest.Create(new Uri(url));
}
catch
{
return;
}
CookieContainer CookieCont= new CookieContainer();
req.AllowAutoRedirect = true;
req.UserAgent = "Mozilla/4.0 (compatible; " +
"MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
req.CookieContainer = CookieCont;
req.Referer = referer;
req.Accept = "image/gif, image/x-xbitmap,
image/jpeg, image/pjpeg,
application/msword,
application/vnd.ms-powerpoint,
application/vnd.ms-excel, */*";
req.Method = "POST";
req.ContentType =
"application/x-www-form-urlencoded";
Now we have prepared a HttpWebRequest
object to perform an HTTP POST. Next we need to prepare the body which contains the actual content that needs to be posted.
StringBuilder postData = new StringBuilder();
for( int i = 0; i < formVariables.Length; i += 2 )
{
postData.AppendFormat("{0}={1}&",
HttpUtility.UrlEncode(formVariables[i]),
HttpUtility.UrlEncode(formVariables[i+1]));
}
postData.Remove(postData.Length - 1, 1);
byte[] postDataBytes =
Encoding.UTF8.GetBytes(postData.ToString());
req.ContentLength = postDataBytes.Length;
Our next task is to connect to the actual tracker system index.php and transmit the data.
Stream postDataStream = req.GetRequestStream();
postDataStream.Write(postDataBytes, 0,
postDataBytes.Length);
postDataStream.Close();
In order to make a successful HTTP dialog, we do need to get the response a bit, but not the whole response. Source forge’s pages are pretty heavy. Do not think of downloading a full page. Besides the page that is returned contains all the posted items and there may be thousands of error reports!
HttpWebResponse resp = null;
try
{
resp = (HttpWebResponse) req.GetResponse();
}
catch
{
return;
}
Stream rcvStream = resp.GetResponseStream();
resp.Close();
rcvStream.Close();
}
There is a very convenient way to catch any exception that someone mistakenly skipped putting inside the try catch
block. The Application
class provides an event ThreadException
which traps any unhandled exception that was not caught by any catch
block. When you forget to catch any exception, you see the shameful “Continue” or “Quit” dialog box with detailed exception trace exposing all your mistakes. If you catch the ThreadException
you can capture any leaked exception, and can remain quiet or silently write to a log file in order to pick it up later on.
RSS Feeder hides its shame by subscribing to two events:
AppDomain.CurrentDomain.UnhandledException +=
new UnhandledExceptionEventHandler(
CurrentDomain_UnhandledException);
Application.ThreadException += new
System.Threading.ThreadExceptionEventHandler(
Application_ThreadException);
These exceptions are quietly handled by my EntLibHelper
which exposes a nice method to handle unhandled exceptions.
private static void Application_ThreadException(object sender,
System.Threading.ThreadExceptionEventArgs e)
{
EntLibHelper.UnhandledException(e.Exception);
}
private static void CurrentDomain_UnhandledException(object sender,
UnhandledExceptionEventArgs e)
{
if( e.ExceptionObject is Exception )
EntLibHelper.UnhandledException(e.ExceptionObject as Exception);
}
EntLibHelper
consults the configuration file to decide what to do with the unhandled exception. You can specify in the configuration file what to do with exceptions in the unhandled exception category. If you specify to throw it again, it shuts down the application. But if you specify to trap it, it sends the exception to the Error Reporting engine for sending it to a tracking system. As a result, you will see the error reporting window pops up which notifies me about the error. This is a zero hassle for both you and me.
internal static void UnhandledException(Exception x)
{
try
{
bool rethrow = ExceptionPolicy.HandleException(x,
"Unhandled Exception");
if (rethrow)
{
System.Windows.Forms.Application.Exit();
}
else
{
ErrorReportForm.QueueErrorReport(
new ErrorReportItem( x.Message, x.ToString() ) );
}
}
...
...
RSS Feeder is not only an RSS aggregator but also an equally powerful blogging tool. It gives you a nice Outlook 2003 style UI to work with multiple weblogs.
You can create one or more weblog accounts, write posts and save for publishing it later on. When you send a post for publishing, it goes to a send queue which is collected by the send/receive module. If there is any problem in posting, you can come back and open the post to see the error report.
Most of the widely used blog engines use XML RPC specially those which are developed using PHP. Some of them are WordPress, B2Evolution, Drupal etc. XML RPC is similar to SOAP where a method call and parameters are serialized to an XML. You can learn more about XML RPC by Googling on XMLRPC and also from this site.
Most of the Blog engines support a well defined API called MetaWeblogAPI
. This API uses XMLRPC to call a fixed set of web methods. Although the API is fixed, the method names differ from one blog engine to another. As a result, you cannot really write a generic code for MetaWeblogAPI
that works on all blog engines which support this API. I believe there is an urgent need to standardize a universally accepted list of methods and parameters or in short a fixed interface for all blog engines. For example, if you are blogging to B2Evolution, you need to use the following methods:
b2.getCategories
: Get the categories for a blog. b2.newPost
: Make a new post.
On the other hand, for WordPress which supports the same MetaWeblogAPI
, the method names are:
metaWeblog.getCategories
metaWeblog.newPost
This makes life difficult to implement a blogging library that works universally for all. However, this has been done pretty well in RSS Feeder. Currently it supports almost all PHP based blog engines and famous .NET based blog engines like .Text and Community Server.
The editing environment has been made as rich as possible while maintaining the restrictions the blog engines apply to formatting. Not all HTML formatting are supported by blog engines. Most of them strip of style
attribute and style sheets. Some don't allow the <font>
tag. Some prevent <div>
inside the <p>
tag. So, the editor that I have made more or less accommodates sufficient formatting tools that suites all.
When you have a big toolbar containing 30 to 40 buttons, it becomes difficult to write code for handling click on each of these buttons. You end up either writing click handlers for each button or one toolbar level handler with a big if else
or switch case
to determine which button was clicked.
Instead of this, I have tried a convenient approach. Each button contains a method name in the tag of a button. For example, the Save button has a tag value set to $SavePost
. The user control has a public
method named SavePost
. Inside the click handler of the toolbar, I have this code:
private void DesignSandBar_ButtonClick(object sender,
TD.SandBar.ToolBarItemEventArgs e)
{
string name = e.Item.Tag as string;
if( null == name || 0 == name.Length ) return;
if( name.StartsWith("$") )
{
MethodInfo method =
this.GetType().GetMethod(name.TrimStart('$'));
method.Invoke( this, null );
}
else
{
MethodInfo [] methods =
this._Editor.GetType().GetMethods( );
foreach( MethodInfo method in methods )
if( method.Name == name &&
0 == method.GetParameters().Length )
{
try
{
method.Invoke( this._Editor, null );
}
catch( Exception x )
{
Debug.WriteLine( x );
}
}
}
}
It uses reflection to find the method on the user control which matches the name specified in the tag and invokes the method. So, you don’t need to write the click handler or switch
blocks to decide which button is clicked and for that which method needs to be called.
Moreover, if the Tag does not start with a $ sign, it means the method needs to be invoked on the DHTML editor control where the user writes the body of the post. The DHTML editor control exposes lots of formatting methods which are thus directly mapped to the toolbar buttons.
The editor you see is the DHTML editor control that comes with Internet Explorer. This is the control that Microsoft uses in most of their products including the HTML Designer Window of Visual Studio. You will find the control in this path:
C:\Program Files\Common Files\Microsoft Shared\Triedit\dhtmled.ocx
Working with this control is pretty difficult because almost all the functionality that it provides is via a method named ExecCommand
. You can find a reference to this method at MSDN. This function lets you control the editor’s behavior, change formatting style, show common dialog boxes like Find, Link, Image etc. For example, you can use the following command to set the editor to Bold mode:
Object nullArg = null;
editorControl.ExecCommand( DHTMLEDITCMDID.DECMD_BOLD,
OLECMDEXECOPT.OLECMDEXECOPT_DODEFAULT, ref nullArg );
As working with this control requires you to memorize all the function names and parameters, a convenient wrapper is necessary. That’s what you have in the HtmlEditor
control provided in the source code. It exposed almost all the functionalities of DHTML editor as convenient public methods and properties like SetBold
, SelectedText
, OrderList
etc. As the wrapper is a .NET class which extends the DHTML editor control, you can easily create a new instance of it and put it inside a container. Here’s the code sample which shows you how to create a new editor and put it inside a Panel
:
private void CreateEditor()
{
_Editor = new HtmlEditor();
_Editor.Dock = DockStyle.Fill;
_Editor.HandleCreated +=
new EventHandler(_Editor_HandleCreated);
editorPanel.Controls.Add( _Editor );
}
One important issue you need to notice is that, you can only work with this component when the handle is created, which means that IsHandleCreated
property is true
. If you want to work with this component on the Load
event of a Control or Form then it will fail. The editor does not load properly when the Load
event is fired. It for some reason takes a long time to get initialized properly. So, any task that you need to perform on the load needs to be performed on the HandleCreated
event of the control. Also keep in mind to always check the IsHandleCreated
property before using this control. For example:
if( this._Editor.IsHandleCreated )
{
this._Editor.NewDocument();
}
You must be wondering how I made the UI. It’s developed using the marvelous free controls found at Divelements Ltd. I am a great fan of their tools and I have also used it in my other project Smart UML.
I have tried to make RSS Feeder an ideal Smart Client application by implementing all the best practices that I have been able to collect from the web. This application is a good example of the use of Enterprise Library in a real application. It also uses the Application Updater Block 2.0 which was a real pain to implement. Hope you will find this article a good source of essentials of desktop application development. Although it’s been only a month since this app was developed and released, I hope it’s pretty much stable and feature-rich to be a part of your daily life. I will look forward to get feedback from you which will guide me to improve this in future and make it the best tool for feed aggregation and blogging.