Click here to Skip to main content
13,556,208 members
Click here to Skip to main content
Add your own
alternative version


43 bookmarked
Posted 8 Jan 2003

HTML fragments parsing and creation

, 15 Mar 2003
Rate this:
Please Sign up or sign in to vote.
Classes to parse HTML parts into an object tree and back


When looking at the Controls collection of the Page object, you quickly realize that all the interesting stuff comes in a LiteralControl. So there is no easy way to insert or change this text in a comfortable manner. Therefore I wrote a few classes that take a string apart into objects. These objects can be changed safely and then generate a new string for a LiteralControl.

I ran across this problem when writing a page template class. You can take the literal code out of the .aspx file, but then the designer seems not to be working very well. So I like to use the designer and change the header literal in the page template.


Parsing HTML is not really a fun thing so I made a few restrictions.

  • The parser does not really understand HTML, but only text, tags and attributes. He does not care what their name and values are.
  • Badly formed input will result in poor output. (i.e. the <meta> is often not closed, so the only way to place a following tag is as a child tag. So the source text must be changed to something like <meta ... />.)
  • Since you can insert plain text into the resulting object tree, you easily ruin the output. (i.e. inserting text like "<junk" will be rendered as "<junk" and not &lt;junk". Remember, the brain is in front of the screen :-)

Using the code

The main class is Fragments. The constructor of Fragments, take a string which is parsed into objects. Fragments is a collection of (guess what?) Fragements. Actually Fragment is the super class of FragmentText (representing simple plain text), FragmentTag (representing a tag <tagname attr="value" ... >), FragmentComment (for a comment <!--<span class="code-comment"> ... --></span> and FragmentDoctype (i.e. <!DOCTYPE HTML ... >).

The objects can be changed, added or removed like in any collection. Objects of type FragmentTag, have a property Nodes representing the sub tags. Since we parse a fragment there can be unmatched tags (i.e. only open or only closing tags. Therefore the FragmentTag has a property Type, which state if there are open and/or closing tags. The value OpenCloseShort stands for tags of the kind <br/>. Obviously these tags can not have Nodes.

Finally using the ToString() method will transform the Fragment into a plain HTML string.

Fragments fragments = new Fragments( someString );
for each ( Fragment fragment in fragments )
  if ( fragment is FragmentTag )
    FragmentTag tag = (FragmentTag)fragment;
    tag.Nodes.Add( new FragmentText( "plain text" ) );
  if ( fragment is FragmentText )
string s = fragments.ToString();

You can also start with an empty Fragments object, insert everything into it and generate the output.

There is a small sample program with the sources, which I use for testing. It demonstrates most of the usage.

Points of interest

I use the Regex class to split the input into pieces. The pattern is rather unreadable, but the basic structure is pattern1|pattern2|pattern3|.... It took some time to understand, that the next match will contain exactly one of the patterns. There I gave each pattern an exclusive name and made some sub groups for parameters or names. Also note that the next match will not continue exactly behind the last match. It will only continue searching there. So we have to keep track ourselves if all input is parsed.


  • Version 1.0 - first release
  • Version 1.1 - bug fixes (exception inside exception, parsing of nested quotes)


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Matthias Gerloff
Web Developer
Germany Germany
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralBug with quotes Pin
crafter14-Mar-03 3:16
membercrafter14-Mar-03 3:16 
GeneralRe: Bug with quotes Pin
Matthias Gerloff16-Mar-03 8:20
memberMatthias Gerloff16-Mar-03 8:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.180515.1 | Last Updated 16 Mar 2003
Article Copyright 2003 by Matthias Gerloff
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid