When looking at the
Controls collection of the
Page object, you quickly realize that all the interesting stuff comes in a
LiteralControl. So there is no easy way to insert or change this text in a comfortable manner. Therefore I wrote a few classes that take a string apart into objects. These objects can be changed safely and then generate a new string for a
I ran across this problem when writing a page template class. You can take the literal code out of the .aspx file, but then the designer seems not to be working very well. So I like to use the designer and change the header literal in the page template.
Parsing HTML is not really a fun thing so I made a few restrictions.
- The parser does not really understand HTML, but only text, tags and attributes. He does not care what their name and values are.
- Badly formed input will result in poor output. (i.e. the
<meta> is often not closed, so the only way to place a following tag is as a child tag. So the source text must be changed to something like
<meta ... />.)
- Since you can insert plain text into the resulting object tree, you easily ruin the output. (i.e. inserting text like
"<junk" will be rendered as
"<junk" and not
<junk". Remember, the brain is in front of the screen :-)
Using the code
The main class is
Fragments. The constructor of
Fragments, take a string which is parsed into objects.
Fragments is a collection of (guess what?)
Fragment is the super class of
FragmentText (representing simple plain text),
FragmentTag (representing a tag
<tagname attr="value" ... >),
FragmentComment (for a comment
<!--<span class="code-comment"> ... --></span> and
<!DOCTYPE HTML ... >).
The objects can be changed, added or removed like in any collection. Objects of type
FragmentTag, have a property
Nodes representing the sub tags. Since we parse a fragment there can be unmatched tags (i.e. only open or only closing tags. Therefore the
FragmentTag has a property
Type, which state if there are open and/or closing tags. The value
OpenCloseShort stands for tags of the kind
<br/>. Obviously these tags can not have
Finally using the
ToString() method will transform the
Fragment into a plain HTML string.
Fragments fragments = new Fragments( someString );
for each ( Fragment fragment in fragments )
if ( fragment is FragmentTag )
FragmentTag tag = (FragmentTag)fragment;
tag.Nodes.Add( new FragmentText( "plain text" ) );
if ( fragment is FragmentText )
string s = fragments.ToString();
You can also start with an empty
Fragments object, insert everything into it and generate the output.
There is a small sample program with the sources, which I use for testing. It demonstrates most of the usage.
Points of interest
I use the
Regex class to split the input into pieces. The pattern is rather unreadable, but the basic structure is
pattern1|pattern2|pattern3|.... It took some time to understand, that the next match will contain exactly one of the patterns. There I gave each pattern an exclusive name and made some sub groups for parameters or names. Also note that the next match will not continue exactly behind the last match. It will only continue searching there. So we have to keep track ourselves if all input is parsed.
- Version 1.0 - first release
- Version 1.1 - bug fixes (exception inside exception, parsing of nested quotes)