Click here to Skip to main content
12,453,183 members (63,723 online)
Click here to Skip to main content
Add your own
alternative version

Stats

8.3K views
2 bookmarked
Posted

Remove all the HTML tags and display a plain text only inside (in case XML is not well formed)

, 19 Dec 2010 CPOL
How about loading it into an XmlDocument and getting the InnerText? (Provided the HTML is well-formed XML, of course.)

Alternatives

Members may post updates or alternatives to this current article in order to show different approaches or add new features.

20 Dec 2010
JHoye
Consider using the open source HTML Agility Pack library (htmlagilitypack.codeplex.com).It lets you use XPATH queries to access very specific parts of an HMTL document, and the HTML does not have to be valid, well-formed XML. In addition to accessing the raw inner text of an element you can...
5 Jan 2011
MarcoBot
NOTE: If you're really wanting plain text, then you should also be sure to decode the HTML entities (System.Web.HttpUtility.HtmlDecode()) on the resulting text, or you'll wind up with HTML/XML character entity text in your output, such as & and [ If you're going to immediately output the...
18 Jan 2011
KevinAG
Sorry, but I have to vote this way down. Your regular expression (or @Chris's) is not robust enough for what I would consider "real world" data. Especially if this is used on any kind of public web site, I would be afraid of JavaScript injection attacks and other things (depending on its usage)....
15 Feb 2012
Andreas Gieriet
I think the following Regex and HtmlDecode would do:string html = ...;string textonly = HttpUtility.HtmlDecode( Regex.Replace(html, @"|", ""));Any HTML construct that would not be stripped off properly by this?

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

PIEBALDconsult
Software Developer (Senior)
United States United States
BSCS 1992 Wentworth Institute of Technology

Originally from the Boston (MA) area. Lived in SoCal for a while. Now in the Phoenix (AZ) area.

OpenVMS enthusiast, ISO 8601 evangelist, photographer, opinionated SOB, acknowledged pedant and contrarian

---------------

"Using fewer technologies is better than using more." -- Rico Mariani

"Good code is its own best documentation. As you’re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn’t needed?’" -- Steve McConnell

"Every time you write a comment, you should grimace and feel the failure of your ability of expression." -- Unknown

"If you need help knowing what to think, let me know and I'll tell you." -- Jeffrey Snover [MSFT]

"Typing is no substitute for thinking." -- R.W. Hamming

"I find it appalling that you can become a programmer with less training than it takes to become a plumber." -- Bjarne Stroustrup

ZagNut’s Law: Arrogance is inversely proportional to ability.

"Well blow me sideways with a plastic marionette. I've just learned something new - and if I could award you a 100 for that post I would. Way to go you keyboard lovegod you." -- Pete O'Hanlon

"linq'ish" sounds like "inept" in German -- Andreas Gieriet

"Things would be different if I ran the zoo." -- Dr. Seuss

"Wrong is evil, and it must be defeated." –- Jeff Ello

"A good designer must rely on experience, on precise, logical thinking, and on pedantic exactness." -- Nigel Shaw

“It’s always easier to do it the hard way.” -- Blackhart

“If Unix wasn’t so bad that you can’t give it away, Bill Gates would never have succeeded in selling Windows.” -- Blackhart

"Omit needless local variables." -- Strunk... had he taught programming

| | Privacy | Terms of Use | Mobile
Web01 | 2.8.160826.1 | Last Updated 19 Dec 2010
Article Copyright 2010 by PIEBALDconsult
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid