Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more: , +
Hi,

I am trying to remove HTML tags from an RSS Feed that I am downloading and adding the items from the RSS feed into a listBox.

I know that this can be used to remove HTML tags, but I am note sure how I would implement it into my code.
C#
public string Strip(string text) 
{
     return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
}

Here is my code, I am trying to remove the HTML tags from the description element.
C#
XElement xmlScan = XElement.Parse(e.Result);

listBox1.ItemsSource = from channel in xmlScan.Descendants("item")
                       select new ScanItem
                       {
                         title = channel.Element "title").Value,
                         description = "Position: " + channel.Element("description").Value
                       };
Posted
Updated 13-Feb-11 6:04am
v2

Well, if you are confident of the Strip method, just replace this:

C#
description = "Position: " + channel.Element("description").Value


with this:

C#
description = "Position: " + Strip(channel.Element("description").Value)
 
Share this answer
 
v2
Comments
Sergey Alexandrovich Kryukov 13-Feb-11 12:20pm    
Nishant, frankly, this is too much of ad-hoc, XML is XML, data binding is data binding, why going into peculiarities, especially based on hard-coded immediate string constants?
--SA
Nish Nishant 13-Feb-11 12:23pm    
I believe the OP's trying to populate his data object that he binds to his ListBox. And he does not want to directly use the raw data (because of the html tags). I believe he only wants to filter the Description though, which indicates that for the actual rss body, he may be using an HTML capable control.
Sergey Alexandrovich Kryukov 13-Feb-11 12:26pm    
I answered how to do this in a regular way.
--SA
Nish Nishant 13-Feb-11 12:28pm    
SA, based on the OP's code, the ListBox is not for any hierarchical data. The ListBox is merely to show the RSS entry descriptions. Example, just the title posts in a blog.
Sergey Alexandrovich Kryukov 13-Feb-11 14:38pm    
Agree, in my answer I did not deny this possibility.
--SA
First, RSS is based on XML, not HTML. It's best to parse the whole feed instead of removing the tags.

Your attempt to make a data source for a list box out of RSS is clashed with the fact that list box lacks hierarchical structure. So, you first need to decide for yourself how you want to map the tree-like hierarchical structure onto linear list box structure. (I would suggest you do something else, like making RSS a source for more adequate TreeView.)

In all cases, you should parse whole RSS field starting from the top element.

If you still want to use ListBox and want to map just the elements of one level or only of one kind, one option is to use System.Xml.XmlReader, which is the fastest method of parsing, especially good if you want to ignore a lot of data.

Alternatively, you can stay with System.Xml.Linq methods of handling XML, you, again, should parse whole RSS starting with System.Xml.Linq.XDocument, XElement. For a record, you can always use DOM-based parsing based on System.Xml.XmlDocument; I would say, probably least recommended for your purposes.

—SA
 
Share this answer
 
v2
Comments
Manfred Rudolf Bihy 13-Feb-11 12:25pm    
Better yet! 5+

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900