Click here to Skip to main content
Click here to Skip to main content

Extract inner text from HTML using Regex

, 17 Oct 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
How to extract the inner text from HTML using a Regular Expression.


Use this code snippet to extract the inner text from Html, its very lightweight, simple and efficient, work well even with malformed Html, no extra dll is needed such as htmlagilitypack.


This method is intended to be used with simple HTML that is free of scripts, styles or comments 


Some tasks require you to extract text from HTML, especially in web scraping. one popular solution is to use the HtmlAgilityPack-DocumentNode.InnerText-, however this requiring you add an extra library to your project, and have drawbacks in some edge cases.

one drawback I noticed is that it might concatenate two words as a single word for example consider the Html string: "<p>this<b>is<b/> a test</p>"  using the HtmlAgilityPack to extract the text will result in "thisis a test".

Using the code   

To use this code you need to import System.Text.RegularExpressions namespace.  Add the following function to your Utilities class or as an extension method:

public static string ExtractHtmlInnerText(string htmlText)
    //Match any Html tag (opening or closing tags) 
    // followed by any successive whitespaces
    //consider the Html text as a single line

    Regex regex = new Regex("(<.*?>\\s*)+", RegexOptions.Singleline);
    // replace all html tags (and consequtive whitespaces) by spaces
    // trim the first and last space

    string resultText = regex.Replace(htmlText, " ").Trim();

    return resultText;


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Web Developer
Jordan Jordan
No Biography provided

Comments and Discussions

QuestionData Scraping from Paginated Grid View PinprofessionalMember 1050050624-Jan-14 19:52 
QuestionScraping Of Data from Paginated Grid View PinprofessionalMember 1050050624-Jan-14 19:52 
SuggestionYou may try to process thes file... [modified] PinmemberAndreas Gieriet16-Oct-12 21:00 
GeneralRe: You may try to process thes file... Pinmemberjahmani17-Oct-12 6:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.150327.1 | Last Updated 17 Oct 2012
Article Copyright 2012 by jahmani
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid