If I understand correctly, you want to get rid of html tags, and extract only the plain text, as you would copy text from web page and paste into notepad. Use regualr expressions. You can use
@"(?>(?:[^>'""]+|'[^']*'|""[^""]*"")*)>"
expression or something like this. If you google for "c# strip html" you will find more,
this for example.