Removing HTML Tags in Silverlight WP7

Question

0.00/5 (No votes)

See more:

, +

Hi,

I am trying to remove HTML tags from an RSS Feed that I am downloading and adding the items from the RSS feed into a listBox.

I know that this can be used to remove HTML tags, but I am note sure how I would implement it into my code.

C#

public string Strip(string text) 
{
     return Regex.Replace(text, @"<(.|\n)*?>", string.Empty);
}

Here is my code, I am trying to remove the HTML tags from the description element.

C#

XElement xmlScan = XElement.Parse(e.Result);

listBox1.ItemsSource = from channel in xmlScan.Descendants("item")
                       select new ScanItem
                       {
                         title = channel.Element "title").Value,
                         description = "Position: " + channel.Element("description").Value
                       };

Posted 13-Feb-11 5:59am

netzure

Updated 13-Feb-11 6:04am

Sandeep Mewara

v2

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Nish Nishant · Accepted Answer · 2011-02-13T06:16:00

Solution 1

Well, if you are confident of the Strip method, just replace this:

C#

description = "Position: " + channel.Element("description").Value

with this:

C#

description = "Position: " + Strip(channel.Element("description").Value)

Posted 13-Feb-11 6:16am

Nish Nishant

v2

Comments

Sergey Alexandrovich Kryukov 13-Feb-11 12:20pm

Nishant, frankly, this is too much of ad-hoc, XML is XML, data binding is data binding, why going into peculiarities, especially based on hard-coded immediate string constants?
--SA

Nish Nishant 13-Feb-11 12:23pm

I believe the OP's trying to populate his data object that he binds to his ListBox. And he does not want to directly use the raw data (because of the html tags). I believe he only wants to filter the Description though, which indicates that for the actual rss body, he may be using an HTML capable control.

Sergey Alexandrovich Kryukov 13-Feb-11 12:26pm

I answered how to do this in a regular way.
--SA

Nish Nishant 13-Feb-11 12:28pm

SA, based on the OP's code, the ListBox is not for any hierarchical data. The ListBox is merely to show the RSS entry descriptions. Example, just the title posts in a blog.

Sergey Alexandrovich Kryukov 13-Feb-11 14:38pm

Agree, in my answer I did not deny this possibility.
--SA

Manfred Rudolf Bihy 13-Feb-11 12:23pm

Nishant enforces a prerequisite so the anser is OK. It's for OP to decide if OP trusts his Strip method.

Nish Nishant 13-Feb-11 12:26pm

Yes, and honestly I was surprised that the OP did not know how to call the Strip method. It indicates that perhaps, just perhaps, the code he's using is not his own. Otherwise I find it extremely odd that he could not have figured this out on his own, when he seems to have had no trouble in coming up with a fairly good Linq query.

Manfred Rudolf Bihy 13-Feb-11 12:50pm

Most likely :)

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-02-13T06:16:00

First, RSS is based on XML, not HTML. It's best to parse the whole feed instead of removing the tags.

Your attempt to make a data source for a list box out of RSS is clashed with the fact that list box lacks hierarchical structure. So, you first need to decide for yourself how you want to map the tree-like hierarchical structure onto linear list box structure. (I would suggest you do something else, like making RSS a source for more adequate TreeView.)

In all cases, you should parse whole RSS field starting from the top element.

If you still want to use ListBox and want to map just the elements of one level or only of one kind, one option is to use System.Xml.XmlReader, which is the fastest method of parsing, especially good if you want to ignore a lot of data.

Alternatively, you can stay with System.Xml.Linq methods of handling XML, you, again, should parse whole RSS starting with System.Xml.Linq.XDocument, XElement. For a record, you can always use DOM-based parsing based on System.Xml.XmlDocument; I would say, probably least recommended for your purposes.

—SA