Click here to Skip to main content
14,450,208 members
Rate this:
Please Sign up or sign in to vote.
See more:
I need to retrieve the RSS feed from Google news and display the title and description in a WPF app. This was easy to do for BBC news as the description was clear with no other information, so I could just read the childnode text. However Google News uses a different format for it's description, so there is a lot of information there I don't need. This is the RSS link: http://news.google.co.uk/news?pz=1&cf=all&ned=uk&hl=en&topic=n&output=rss[^]

The description and image link are both in the same body of text. Can anyone tell me the best way to extract this text from the rest of the text. I have an idea it's something to do with regular expressions but I never used that function before. Below is the code I use for reading the feed if it helps.

try
            {
                // load the xml file
                XmlDocument xmlDoc = new XmlDocument();
                XmlNode nodeRss = null;
                XmlNode nodeChannel = null;
                XmlNode nodeItem = null;
                XmlTextReader xmlReader = new XmlTextReader("http://news.google.co.uk/news?pz=1&cf=all&ned=uk&hl=en&topic=n&output=rss");
                xmlDoc.Load(xmlReader);e
                for (int i = 0; i < xmlDoc.ChildNodes.Count; i++)
                {
                    if (xmlDoc.ChildNodes[i].Name == "rss")
                    {

                        // <rss> tag found

                        nodeRss = xmlDoc.ChildNodes[i];

                    }
                }
                for (int i = 0; i < nodeRss.ChildNodes.Count; i++)
                {

                    // If it is the channel tag

                    if (nodeRss.ChildNodes[i].Name == "channel")
                    {

                        // <channel> tag found

                        nodeChannel = nodeRss.ChildNodes[i];
                    }
                }
                string title = null;
                string description = null;
                for (int j = 0; j < nodeChannel.ChildNodes.Count; j++)
                {
                    if (nodeChannel.ChildNodes[j].Name == "item")
                    {
                        nodeItem = nodeChannel.ChildNodes[j];
                        for (int i = 0; i < nodeItem.ChildNodes.Count; i++)
                        {
                            if (nodeItem.ChildNodes[i].Name == "title")
                            {
                                title = nodeItem.ChildNodes[i].InnerText.ToString();
                            }
                            if (nodeItem.ChildNodes[i].Name == "description")
                            {
                                description = nodeItem.ChildNodes[i].InnerText.ToString();
                            }
                        }
                        GoogleItems.Add(new GoogleFeed(title, description));
                    }
                }
                MainNews.Text = GoogleItems[0].Description;
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }


Another news feed I found is http://rss.msnbc.msn.com/id/3032506/device/rss/rss.xml[^]

If looks simpler than the previous link, so any advice on getting the description and image url out of that feed will be welcome.

Thanks for your help in advance.
Posted
Updated 15-Feb-11 3:44am
v3

1 solution

Rate this:
Please Sign up or sign in to vote.

Solution 1

This seems to work for me:

static string Strip(string text)
{
    return Regex.Replace(text, @"<(.|\n)*?>", String.Empty);
}

static void Main()
{
    XmlTextReader xmlReader = new XmlTextReader("http://news.google.co.uk/news?pz=1&cf=all&ned=uk&hl=en&topic=n&output=rss");
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(xmlReader);

    XPathNavigator navigator = xmlDoc.CreateNavigator();

    string mainTitle = Strip(navigator.SelectSingleNode("rss/channel/image/title").Value);
    string mainUrl = Strip(navigator.SelectSingleNode("rss/channel/image/url").Value);
    string mainLink = Strip(navigator.SelectSingleNode("rss/channel/image/link").Value);

    XPathNodeIterator items = navigator.Select("rss/channel/item");
    while (items.MoveNext())
    {
        XPathNavigator item = items.Current;
        string title = Strip(item.SelectSingleNode("title").Value);
        string category = Strip(item.SelectSingleNode("category").Value);
        string description = Strip(item.SelectSingleNode("description").Value);
    }      
}




Hope this helps,
Fredrik Bornander
   
v3
Comments
Neil Cross 15-Feb-11 9:59am
   
The description still returns far too much information. I need to extract the single coherent piece of news from it and ignore everything else. I appreciate your help though.
Fredrik Bornander 15-Feb-11 10:51am
   
I assume that you mean all the html stuff?
I've updated my answer to strip that out as well.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100