Click here to Skip to main content
15,867,568 members
Articles / Programming Languages / XSLT
Article

Really The Most Simple Syndication

Rate me:
Please Sign up or sign in to vote.
4.83/5 (14 votes)
26 Apr 20065 min read 37.4K   443   42  
The article presents the XSLT/JScript framework that handles all types of RSS feeds and HTML-based newsreader application that is built upon it.

What is it about

In the recent years, RSS has proved to be an extremely useful data-distribution technology. This article addresses the problem of handling different standards of RSS feeds in a single application. It can be useful for everyone who builds either one's own desktop aggregator or a corporate intranet environment. This article is accompanied with a skeleton of a newsreader application.

This article assumes you're using MSXML 3.0+ as the XML/XSLT processor.

Standardization

The only sad thing about RSS is the number of standards in use. You cannot be sure what you'll get while surfing the net, so you must be ready for anything. "Anything" is:

  • RSS 0.90 - the initial release of RSS tech, created by Netscape, is almost extinct now. Specs are still available at PurplePages archive.
  • RSS 2.0/0.91-0.94 - the most popular branch of RSS. Revised and simplified (by UserLand's Dave Winer) version of the original. For this format, RSS stands for Really Simple Syndication. It became even more popular with the introduction of podcasting. By the way, don't be fooled with the version numbers: the version prior to 2.0 was 0.94, not 1.0 (which is entirely different from 0.9x)! Specs are available in the UserLand site.
  • RSS 1.0 - not really a standard, but a derivative of RDF (Resource Description Framework) - Web standard for metadata developed by W3C. Verbose and extensible (with the use of modules), it is much more flexible than v2.0. For the 0.90 and 1.0 versions, RSS stands for RDF Site Summary. Specs are available in the site, RSS-DEV Working Group.
  • Atom - the most recent, thus the most rare syndication format. Atom is the first attempt (undertaken by the Internet Engineering Task Force Working Group) to develop a standardized, enterprise-wide syndication format. The complete specification can be found on the IETF site.

Transform...

Let's begin with the stylesheets. Three things are worth taking a note:

  • local-name() XSLT function: very useful when you need to rip out all the namespace stuff to painlessly obtain the name of the node.
  • disable-output-escaping option of the xsl:value-of instruction: a must-know XSLT element. Cause: www-masters tend to embed funky HTML markup into the "description" and "summary" fields. By setting this option to "yes" we preserve the markup, having a nice-looking page as a result, and not a mess of tags. On the downside we have one security problem: disable-output-escaping can expose your local computer to malicious scripts, if it is embedded into the feed. Normally, you must have some kind of stripper for <SCRIPT> and <OBJECT> tags; unless you have it, you are advised to read RSS feeds only from trusted sites.
  • <xsl:text/> instruction: use when you want to strip unnecessary whitespace "by hand". This is very useful for keeping the output HTML code indentation under control.

The first three stylesheets can be used to build a newspaper-style news feed:

Listing 1.1: XSLT stylesheet for building newspaper-style HTML page from an RSS 2.0/0.91 feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" version="1.0" 
      indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <html><body>
      <div style=
         "padding: 1em;background-color: #fafafa; border: 1px solid #cfcfcf;">
        <xsl:for-each select="rss/channel/item">
          <xsl:variable name="stl">
            <xsl:text/>background-color: #efeff5; 
                border: 1px solid #cfcfcf;padding: 0em 1em 0em; margin:
            <xsl:text/>
            <xsl:choose>
              <xsl:when test="position()=last()"> 0em</xsl:when>
              <xsl:otherwise> 0em 0em 1em 0em</xsl:otherwise>
            </xsl:choose>
          </xsl:variable>
          <div>
            <xsl:attribute name="style"><xsl:value-of select="$stl"/>
            </xsl:attribute>
            <p><h3 style="color:#800000"><xsl:value-of select="title"/></h3>
            </p>
            <p><xsl:value-of disable-output-escaping="yes" 
                                               select="description"/>
            </p>
            <xsl:variable name="pub" select="pubDate"/>
            <xsl:if test="count($pub) > 0">
              <p align="right" 
                  style="margin:0; padding:0"><xsl:value-of select="pubDate"/>
              </p>
            </xsl:if>
            <p style="margin:0; padding:0em 0em 1em 0em"><a target="_blank">
              <xsl:attribute name="href">
                <xsl:value-of select="link"/>
              </xsl:attribute>
              <xsl:value-of select="link"/>
            </a></p>
          </div>
        </xsl:for-each>
      </div>
    </body></html>
  </xsl:template>
</xsl:stylesheet>

Listing 1.2: XSLT stylesheet for building newspaper-style HTML page from an RSS 1.0 feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
      version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <html><body>
      <div style=
         "padding: 1em;background-color: #fafafa; border: 1px solid #cfcfcf;">
        <xsl:for-each select="*/*[local-name()='item']">
          <xsl:variable name="stl">
            <xsl:text/>background-color: #efeff5; 
               border: 1px solid #cfcfcf;padding: 0em 1em 0em; margin:
            <xsl:text/>
            <xsl:choose>
              <xsl:when test="position()=last()"> 0em</xsl:when>
              <xsl:otherwise> 0em 0em 1em 0em</xsl:otherwise>
            </xsl:choose>
          </xsl:variable>
          <div>
            <xsl:attribute name="style"><xsl:value-of select="$stl"/>
            </xsl:attribute>
            <p><h3 style="color:#800000">
                <xsl:value-of select="./*[local-name()='title']"/>
            </h3></p>
            <p><xsl:value-of disable-output-escaping="yes" 
                select="./*[local-name()='description']"/>
            <br/><br/>
            <xsl:variable name="pub" select="*[local-name()='date']"/>
            <xsl:variable name="pub_date" 
                select="concat(substring($pub, 0, 11), ', ', 
                               substring($pub, 12, 8), ' (GMT+',  
                               substring($pub, 21, 5), ')')"/>
            <xsl:if test="count($pub) > 0">
              <div align="right" style="margin:0em; padding:0em 0em 0em 0em;">
              <xsl:value-of select="$pub_date"/></div>
            </xsl:if>
            <a target="_blank">
              <xsl:attribute name="href">
                <xsl:value-of select="./*[local-name()='link']"/>
              </xsl:attribute><xsl:value-of select="./*[local-name()='link']"/>
            </a></p>
          </div>
        </xsl:for-each>
      </div>
    </body></html>
  </xsl:template>
</xsl:stylesheet>

Listing 1.3: XSLT stylesheet for building newspaper-style HTML page from an atom feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
      version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <html><body>
      <div style=
         "padding: 1em;background-color: #fafafa; border: 1px solid #cfcfcf;">
        <xsl:for-each select="*/*[local-name()='entry']">
          <xsl:variable name="stl">
            <xsl:text/>
              background-color: #efeff5; 
              border: 1px solid #cfcfcf;padding: 0em 1em 0em; margin:
            <xsl:text/>
            <xsl:choose>
              <xsl:when test="position()=last()"> 0em</xsl:when>
              <xsl:otherwise> 0em 0em 1em 0em</xsl:otherwise>
            </xsl:choose>
          </xsl:variable>
          <div>
            <xsl:attribute name="style"><xsl:value-of select="$stl"/>
            </xsl:attribute>
            <p><h3 style="color:maroon">
              <xsl:value-of select="*[local-name()='title']"/></h3></p>
            <p><xsl:value-of disable-output-escaping="yes" 
              select="*[local-name()='summary']"/></p>
            <xsl:variable name="pub" select="*[local-name()='updated']"/>
            <xsl:variable name="pub_date" select=
              "concat(substring($pub, 0, 11), ', ', substring($pub, 12, 8))"/>
            <xsl:if test="count($pub)>0">
              <p align="right" style="margin:0; padding:0;">
                <xsl:value-of select="$pub_date"/>
              </p>
            </xsl:if>
            <p style="margin:0; padding:0em 0em 1em 0em;"><a target="_blank">
              <xsl:attribute name="href">
                <xsl:value-of 
                  select="*[local-name()='link']/@*[local-name()='href']"/>
              </xsl:attribute>
              <xsl:value-of 
                select="*[local-name()='link']/@*[local-name()='href']"/>
            </a></p>
          </div>
        </xsl:for-each>
      </div>
    </body></html>
  </xsl:template>
</xsl:stylesheet>

The next point of interest is the list of all the titles found in the feed - the outline. Each item in this list will be a link to a JavaScript "navTo" function, with a numeric argument equal to the item's position in the list.

Listing 2.1: XSLT stylesheet for retrieving a list of items from an RSS 2.0/0.91 feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
    version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <html><body><ul style="margin-left:25">
      <xsl:for-each select="rss/channel/item">
        <li><a href="javascript:navTo('{position()}')">
          <font style="size:-1;color:#800000">
            <xsl:value-of select="title"/>
          </font>
        </a><br/></li>
      </xsl:for-each>
    </ul></body></html>
  </xsl:template>
</xsl:stylesheet>

Listing 2.2: XSLT stylesheet for retrieving a list of items from an RSS 1.0 feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" version="1.0" 
              indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <html><body><ul style="margin-left:25">
      <xsl:for-each select="*/*[local-name()='item']">
        <li><a href="javascript:navTo('{position()}')">
          <font style="size:-1;color:#800000">
            <xsl:value-of select="./*[local-name()='title']/text()"/>
          </font>
        </a><br/></li>
      </xsl:for-each>
    </ul></body></html>
  </xsl:template>
</xsl:stylesheet>

Listing 2.3: XSLT stylesheet for retrieving a list of items from an atom feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
    version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="/">
    <html><body><ul style="margin-left:25">
      <xsl:for-each select="*/*[local-name()='entry']">
        <li><a href="javascript:navTo('{position()}')">
          <font style="size:-1;color:#800000">
            <xsl:value-of select="./*[local-name()='title']/text()"/>
          </font>
        </a><br/></li>
      </xsl:for-each>
    </ul></body></html>
  </xsl:template>
</xsl:stylesheet>

The last set of stylesheets do the job of transforming a single news item. Please take a note that these transformations cannot be applied to the original RSS file; prior to using them, you must programmatically extract the required item and apply one of the stylesheets to it.

Listing 3.1: XSLT stylesheet for representing a distinct news item from an RSS 2.0/0.91 feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
    version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="*">
    <div style="padding: 0em 1em 0em;
      background-color: #fafafa; border: 1px solid #cfcfcf;">
      <p><h3 style="color:#800000"><xsl:value-of select="title"/></h3></p>
      <div style="padding: 0em 1em 0em; margin: 0em; 
        background-color: #efeff5; border: 1px solid #cfcfcf;">
        <p><xsl:value-of disable-output-escaping="yes" select="description"/>
        </p>
        <xsl:variable name="pub" select="pubDate"/>
        <xsl:if test="count($pub)>0">
          <p align="right" style="margin:0em; padding:0em 0em 1em 0em;">
          <xsl:value-of select="pubDate"/></p>
        </xsl:if>
      </div>
      <p><a target="_blank">
        <xsl:attribute name="href">
          <xsl:value-of select="link"/>
        </xsl:attribute>
        <xsl:value-of select="link"/>
      </a></p>
    </div>
  </xsl:template>
</xsl:stylesheet>

Listing 3.2: XSLT stylesheet for representing a distinct news item from an RSS 1.0 feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
    version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="*">
    <div style="padding: 0em 1em 0em;
      background-color: #fafafa; border: 1px solid #cfcfcf;">
      <p><h3 style="color:#800000">
        <xsl:value-of select="./*[local-name()='title']"/>
      </h3></p>
      <div style="padding: 0em 1em 0em; margin: 0em; 
        background-color: #efeff5; border: 1px solid #cfcfcf;">
        <p><xsl:value-of disable-output-escaping="yes" 
            select="./*[local-name()='description']"/></p>
        <xsl:variable name="pub" select="*[local-name()='date']"/>
        <xsl:variable name="pub_date" 
            select="concat(substring($pub, 0, 11), ', ', 
                        substring($pub, 12, 8), 
                        ' (GMT+',  substring($pub, 21, 5), ')')"/>
        <xsl:if test="count($pub) > 0">
          <p align="right" style="margin:0em; padding:0em 0em 1em 0em;">
              <xsl:value-of select="$pub_date"/>
          </p>
        </xsl:if>
      </div>
      <p><a target="_blank">
        <xsl:attribute name="href">
          <xsl:value-of select="./*[local-name()='link']"/>
        </xsl:attribute><xsl:value-of select="./*[local-name()='link']"/>
      </a></p>
    </div>
  </xsl:template>
</xsl:stylesheet>

Listing 3.3: XSLT stylesheet for representing a distinct news item from an atom feed

XML
<?xml version="1.0"?>
  <xsl:stylesheet 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="html" 
    version="1.0" indent="yes" encoding="iso-8859-1"/>
  <xsl:template match="*">
    <div style="padding: 0em 1em 0em;
       background-color: #fafafa; border: 1px solid #cfcfcf;">
      <p><h3 style="color:#800000">
          <xsl:value-of select="*[local-name()='title']"/>
      </h3></p>
      <xsl:variable name="cnt" select="*[local-name()='content']">
      </xsl:variable>
      <xsl:if test="count($cnt)>0">
        <div style="padding: 0em 1em 0em 1em; margin: 0em; 
          background-color: #efeff5; border: 1px solid #cfcfcf;">
          <p><xsl:value-of disable-output-escaping="yes" 
                   select="*[local-name()='content']"/></p>
          <xsl:variable name="pub" select="*[local-name()='updated']"/>
          <xsl:variable name="pub_date" 
             select="concat(substring($pub, 0, 11), ', ', 
                                 substring($pub, 12, 8))"/>
          <xsl:if test="count($pub)>0">
            <p align="right" style="margin:0em; padding:0em 0em 1em 0em;">
                <xsl:value-of select="$pub_date"/>
            </p>
          </xsl:if>
        </div>
      </xsl:if>
      <xsl:if test="count($cnt)=0">
        <p style="padding: 1em; margin: 0em; 
          background-color: #efeff5; border: 1px solid #cfcfcf;">
          <xsl:value-of disable-output-escaping="yes" 
                             select="*[local-name()='summary']"/>
        </p>
      </xsl:if>
      <p><a target="_blank">
        <xsl:attribute name="href">
          <xsl:value-of 
            select="*[local-name()='link']/@*[local-name()='href']"/>
        </xsl:attribute>
        <xsl:value-of 
          select="*[local-name()='link']/@*[local-name()='href']"/>
      </a></p>
    </div>
  </xsl:template>
</xsl:stylesheet>

...and read

Before doing anything to an RSS file, you need to know the standard it belongs to, right? We do this by analyzing the child node of <xml-stylesheet>:

Listing 4.1: Extracting the RSS standard

JavaScript
function whatStd(rssdocument)
{
    var rssroot = 
        rssdocument.documentElement.selectSingleNode("/*");
    var rsssdtd = rssroot.baseName;

    switch(rsssdtd)
    {
      case "rss":
        return "rss2";

      case "RDF":
        return "rss1";

      case "feed":
        return "atom";

      default:
        return "";
    }
}

The bad thing about this (and all the following) code is that it heavily uses Microsoft extensions to W3C's XML API. As a solution you can simply extract the firstChild of the DocumentElement.

Listing 4.2: Extracting RSS channel info

JavaScript
var rss_title;
switch(standard)
{
case "atom":
  rss_title = xml.documentElement.selectSingleNode(
                         "/*/*[local-name()='title']");
  break;

case "rss1":
  rss_title = xml.documentElement.selectSingleNode(
      "/*/*[local-name()='channel']/*[local-name()='title']");
  break;

case "rss2":
  rss_title = xml.documentElement.selectSingleNode(
                                   "/*/channel/title");
  break;
}

var rss_link;
switch(standard)
{
case "atom":
  rss_link = xml.documentElement.selectSingleNode(
          "/*/*[local-name()='link']/@*[local-name()='href']");
  break;

case "rss1":
  rss_link = xml.documentElement.selectSingleNode(
        "/*/*[local-name()='channel']/*[local-name()='link']");
  break;

case "rss2":
  rss_link = xml.documentElement.selectSingleNode(
                                            "/*/channel/link");
  break;
}

rsstitle.innerHTML = 
    "<a target=\"_blank\" title=\"Opens in new window\" href=\"" + 
    rss_link.text + 
    "\"><font color=\"maroon\" size=\"4\"><b>" + 
    rss_title.text + "</b></font></a>";

Having extracted a channel info, it'll be very easy to extract a single item from the feed. Here we go.

Listing 4.3: Extracting the news item

JavaScript
function navTo(where)
{
    if(rssFile != "")
    {
      var rss_item;
    
      switch(standard)
      {
        case "atom":
          rss_item = xml.documentElement.selectSingleNode(
                 "/*/*[local-name()='entry'][" + where + "]");
          break;
    
        case "rss1":
          rss_item = xml.documentElement.selectSingleNode(
                  "/*/*[local-name()='item'][" + where + "]");
          break;
    
        case "rss2":
          rss_item = xml.documentElement.selectSingleNode(
                            "/*/channel/item[" + where + "]");
          break;
    
        default:
          rss_item = null;
      }
    
      if(rss_item)
      {
        var item_i = 
            rss_item.transformNode(xsl_i.documentElement);
    
        contentcell.vAlign = "Top";
        content.innerHTML = item_i;
    
        ...
    
      }
    }
}

Take a note of the xsl_i (used in the transformation), which is the item-extracting stylesheet I've described earlier. Where is a string representation of a number - the position of an item inside a feed.

That's all. Feel free to e-mail me all your suggestions/opinions/bug reports.

Links

Tutorials

RSS lists

Tools and everything else

History

  • 23rd November, 2005
    • Article posted, first version of stylesheets and newsreader.
  • 13th February, 2006
    • Code cleanup, some functions completely rewritten;
    • XPath queries cleaned up;
    • 'save HTML' capability added;
    • XSLT stylesheets optimized/cleaned up.
  • April 25th, 2006:
    • Automated/manual feed update capability added;
    • minor improvements and bugfixes.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer Freelance software engineer
Russian Federation Russian Federation
Dmitry Khudorozhkov began programming (and gaming) with his ZX Spectrum in 1989. Having seen and used all IBM PCs from early XT to the latest x64 machines, now Dmitry is a freelance programmer, living in Moscow, Russia. He is a graduate of the Moscow State Institute of Electronics and Mathematics (Applied Mathematics).

He is proficient in:

- C/C++ - more that 9 years of experience. Pure Win32 API/MFC desktop programming, networking (BSD/Win sockets), databases (primarily SQLite), OpenGL;

- JavaScript - more that 6 years of experience. Client-side components, AJAX, jQuery installation and customization;

- Firefox extensions (immediatelly ready for addons.mozilla.org reviewing) and Greasemonkey scripts. As an example of extensions Dmitry made you can search for FoxyPrices or WhatBird Winged Toolbar;

- XML and it's applications (last 2 years): XSLT (+ XPath), XSD, SVG, VML;

- ASP.NET/C# (webservices mostly);

Also familiar with (= entry level):

- PHP;

- HTML/CSS slicing.

Trying to learn:

- Ruby/Ruby-on-Rails;

- Czech language.

If you wish to express your opinion, ask a question or report a bug, feel free to e-mail:dmitrykhudorozhkov@yahoo.com. Job offers are warmly welcome.

If you wish to donate - and, by doing so, support further development - you can send Dmitry a bonus through the Rentacoder.com service (registration is free, Paypal is supported). Russian users can donate to the Yandex.Money account 41001132298694.

-

Comments and Discussions

 
-- There are no messages in this forum --