65.9K
CodeProject is changing. Read more.
Home

How to Automatically Close Un-closed HTML Tags using C# for ASP.NET Web Applications

starIconstarIconstarIconstarIconstarIcon

5.00/5 (8 votes)

Oct 7, 2014

CPOL

1 min read

viewsIcon

38028

We need this script for database based ASP.NET websites for using HTML content in post pages.

Introduction

Sometimes we want to get the summary of a full HTML article/post to show some lines of that in the main page. Therefore, if we cut the HTML string from the middle like a regular string, we have so many un-closed open HTML tags. So what happens is that the browser cannot find the correct closing tags for the open tags. For example, if we have an un-closed tag like <div>, we should close it. If not <div> will be closed by the next </div> outside the post area and posts will be arranged together.

Background

In this simple script, I use two regular expressions to export and compare tags, one for the start tag and one for the end tag. Then I make a reverse order for the start tag list. See the below to imagine this:

Order

Start Tag List End Tag (false) End Tag (true)
Normal Reverse Normal Normal
1 <html> <p> </p> </p>
2 <div> <input> </input> </input>
3 <span style=”color:red;”> <form> </form> </form>
4 <form> <span style=”color:red;”> NO END TAG </span>
5 <input> <div> NO END TAG </div>
6 <p> <html> NO END TAG </html>

The code is as follows:

public static string AutoCloseHtmlTags(string inputHtml)
{
    var regexStartTag = new Regex(@"<(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big" + 
          @"|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|" + 
          @"header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|" + 
          @"map|mark|menu|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|" + 
          @"output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|" + 
          @"source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|" + 
          @"tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr)(\s\w+.*(\u0022|'))?>");
    var startTagCollection = regexStartTag.Matches(inputHtml);
    var regexCloseTag = new Regex(@"</(!--\u002E\u002E\u002E--|!DOCTYPE|a|abbr|" + 
          @"acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|" + 
          @"big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|" + 
          @"command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|" + 
          @"figcaption|figure|font|footer|form|frame|frameset|h1> to <h6|head|header" + 
          @"|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|map|mark|menu|" + 
          @"meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|" + 
          @"progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|" + 
          @"strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|" + 
          @"time|title|tr|track|tt|u|ul|var|video|wbr)>");
    var closeTagCollection = regexCloseTag.Matches(inputHtml);
    var startTagList = new List<string>();
    var closeTagList = new List<string>();
    var resultClose = "";
    foreach (Match startTag in startTagCollection)
    {
        startTagList.Add(startTag.Value);
    }
    foreach (Match closeTag in closeTagCollection)
    {
        closeTagList.Add(closeTag.Value);
    }
    startTagList.Reverse();
    for (int i = 0; i < closeTagList.Count; i++)
    {
        if (startTagList[i] != closeTagList[i])
        {
            int indexOfSpace = startTagList[i].IndexOf(
                     " ", System.StringComparison.Ordinal);
            if (startTagList[i].Contains(" "))
            {
                startTagList[i].Remove(indexOfSpace);
            }
            startTagList[i] = startTagList[i].Replace("<", "</");
            resultClose += startTagList[i] + ">";
            resultClose = resultClose.Replace(">>", ">");
        }
    }
    return inputHtml + resultClose;
} 

Please let me know about your ideas...