Most people are familiar with RSS (Really Simple Syndication) as a method of downloading blog posts, but it can also be used as a way to download files. Using the
<enclosure> tag within an item allows a file to be associated with the item, like an attachment on an email.
This mechanism is used predominantly by podcasters to distribute their shows.
There are many applications that allow podcasts to be downloaded, with all kinds of additional bells and whistles, but I wanted a simple application I could run as a schedule task (even if I wasn't logged in) to download my favourite podcasts.
This project uses some of the new XML features of VB.NET 9.0. XML Literals are used to create and update a download history XML file, and LINQ to XML is used to find any enclosures that have yet to be downloaded.
Using the Code
The project has been implemented as a Console application for simplicity sake, but could easily be converted into a Windows application, or even a Windows Service. All the work is done by a single sub-routine,
DownloadRSSEnclosures, which takes the URL of the RSS feed and the folder to download the enclosures to as parameters. These parameters are passed to the sub-routine from command line arguments.
The first thing
DownloadRSSEnclosures does is check for the existence of a download history file in the download folder for the feed. If it exists, it loads it into
_DownloadHistoryXml; if not, it uses XML Literals to create a new XML document, ready to hold the download history.
_DownloadHistoryXml = <?xml version="1.0" encoding="UTF-8"?><history/>
Using XML Properties, another new feature in VB.NET, the history node is located in order to set the last-download-date attribute. Because an XML document could contain multiple history nodes,
DownloadHistoryXml.<history> actually returns an
XElement. We know there will only be a single node, so we can use the LINQ
Single() extension method to select that single node, before going on to set its last-download-date attribute to the current date/time.
Downloading the RSS XML is trivial; simply calling the
Load() method will return a new
XDocument containing the XML. Slightly less trivial is the LINQ to XML that is used to select all the URLs to enclosures in the RSS feed that are not already in the download history.
Dim _NewEnclosures = _
From enclosure In _RssXml.<rss>.<channel>.<item>.<enclosure> _
Where Not (From download In _DownloadHistoryXml.<history>.<download> Select
With the LINQ returning a list of the URLs that have not been downloaded, a simple
For Each loop can be used to download each file, save it to the download folder, and update the history (saving it after each file is downloaded in case the application is terminated before it completes). The only point of interest in the download loop is the way in which the download is added to the history XML file. Notice how XML Literals are used again to create the XML node (with an wmbedded expression to insert the URL), which is added to the history node.
_DownloadHistoryXml.<history>.Single().Add(<download url=<%= _EnclosureUrl %>/>)
- 3 June 2008 - Original article published.