Getting description from google news RSS Feed

Question

4.00/5 (1 vote)

See more:

I need to retrieve the RSS feed from Google news and display the title and description in a WPF app. This was easy to do for BBC news as the description was clear with no other information, so I could just read the childnode text. However Google News uses a different format for it's description, so there is a lot of information there I don't need. This is the RSS link: http://news.google.co.uk/news?pz=1&cf=all&ned=uk&hl=en&topic=n&output=rss[^]

The description and image link are both in the same body of text. Can anyone tell me the best way to extract this text from the rest of the text. I have an idea it's something to do with regular expressions but I never used that function before. Below is the code I use for reading the feed if it helps.

C#

try
            {
                // load the xml file
                XmlDocument xmlDoc = new XmlDocument();
                XmlNode nodeRss = null;
                XmlNode nodeChannel = null;
                XmlNode nodeItem = null;
                XmlTextReader xmlReader = new XmlTextReader("http://news.google.co.uk/news?pz=1&cf=all&ned=uk&hl=en&topic=n&output=rss");
                xmlDoc.Load(xmlReader);e
                for (int i = 0; i < xmlDoc.ChildNodes.Count; i++)
                {
                    if (xmlDoc.ChildNodes[i].Name == "rss")
                    {

                        // <rss> tag found

                        nodeRss = xmlDoc.ChildNodes[i];

                    }
                }
                for (int i = 0; i < nodeRss.ChildNodes.Count; i++)
                {

                    // If it is the channel tag

                    if (nodeRss.ChildNodes[i].Name == "channel")
                    {

                        // <channel> tag found

                        nodeChannel = nodeRss.ChildNodes[i];
                    }
                }
                string title = null;
                string description = null;
                for (int j = 0; j < nodeChannel.ChildNodes.Count; j++)
                {
                    if (nodeChannel.ChildNodes[j].Name == "item")
                    {
                        nodeItem = nodeChannel.ChildNodes[j];
                        for (int i = 0; i < nodeItem.ChildNodes.Count; i++)
                        {
                            if (nodeItem.ChildNodes[i].Name == "title")
                            {
                                title = nodeItem.ChildNodes[i].InnerText.ToString();
                            }
                            if (nodeItem.ChildNodes[i].Name == "description")
                            {
                                description = nodeItem.ChildNodes[i].InnerText.ToString();
                            }
                        }
                        GoogleItems.Add(new GoogleFeed(title, description));
                    }
                }
                MainNews.Text = GoogleItems[0].Description;
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

Another news feed I found is http://rss.msnbc.msn.com/id/3032506/device/rss/rss.xml[^]

If looks simpler than the previous link, so any advice on getting the description and image url out of that feed will be welcome.

Thanks for your help in advance.

Posted 14-Feb-11 23:35pm

Neil Cross

Updated 15-Feb-11 2:44am

v3

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Fredrik Bornander · Answer 1 · 2011-02-15T03:12:00

This seems to work for me:

C#

static string Strip(string text)
{
    return Regex.Replace(text, @"<(.|\n)*?>", String.Empty);
}

static void Main()
{
    XmlTextReader xmlReader = new XmlTextReader("http://news.google.co.uk/news?pz=1&cf=all&ned=uk&hl=en&topic=n&output=rss");
    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(xmlReader);

    XPathNavigator navigator = xmlDoc.CreateNavigator();

    string mainTitle = Strip(navigator.SelectSingleNode("rss/channel/image/title").Value);
    string mainUrl = Strip(navigator.SelectSingleNode("rss/channel/image/url").Value);
    string mainLink = Strip(navigator.SelectSingleNode("rss/channel/image/link").Value);

    XPathNodeIterator items = navigator.Select("rss/channel/item");
    while (items.MoveNext())
    {
        XPathNavigator item = items.Current;
        string title = Strip(item.SelectSingleNode("title").Value);
        string category = Strip(item.SelectSingleNode("category").Value);
        string description = Strip(item.SelectSingleNode("description").Value);
    }      
}

Hope this helps,
Fredrik Bornander