Click here to Skip to main content
13,087,566 members (95,430 online)
Click here to Skip to main content
Add your own
alternative version


151 bookmarked
Posted 21 Mar 2004


, 30 Mar 2004
Rate this:
Please Sign up or sign in to vote.
A non-well-formed HTML parser for .NET


This library produces a domain tree of a given HTML document, allowing the developer to navigate and change the document in an methodical way. In addition to the basic HTML production, this library can also be used to produce XHTML documents, as it includes an HTML 4 entity encoder. Included in this release is a demonstration application in VB.NET showing how to use the library. I hope that it is all fairly self-explanatory.


This library was written to avoid having to convert a document into XML prior to reading, whilst preserving the distinct HTML qualities. This gets round some deployment issues I had with different platforms.

Using the code

The simplest way to use the code is to add it into your solution as a C# class library. There are no third-party dependencies so it is just a matter of adding the source files in. Alternatively, you can build the DLL and add it as a reference.

Points of Interest

The XHTML production is fairly basic - there is no built-in DTD checking. So far, I have had no problems in the generation, but I'm keen on getting that sorted.


  • 1.4
  • 1.3
    • Bugfix: <!DOCTYPE...> and <!...> now treated as comments
    • Bugfix: Malformed or incomplete attribute values causing infinite loop fixed
  • 1.2
    • Bugfix: <tag/> now handled properly
    • Bugfix: Parse errors of scripts
    • Bugfix: Parse errors of styles
    • HTML 4 entity encoding
    • DOM tree navigation
    • Basic node searching
    • HTML production
    • XHTML production (as per
    • Added some component model stuff & comments
    • Hid the parser
  • 1.1
    • Initial release


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Member 987427
United Kingdom United Kingdom
No Biography provided

You may also be interested in...

Comments and Discussions

QuestionHTML marked Pin
maingaosuong25-Sep-07 16:19
membermaingaosuong25-Sep-07 16:19 
Generalregarding extracting tags Pin
rama jayapal29-Mar-07 3:49
memberrama jayapal29-Mar-07 3:49 
GeneralRe: regarding extracting tags Pin
James S.F. Hsieh29-Mar-07 22:09
memberJames S.F. Hsieh29-Mar-07 22:09 
GeneralYou can try it. :) Pin
James S.F. Hsieh27-Mar-07 22:36
memberJames S.F. Hsieh27-Mar-07 22:36 
Generalmisses some IMG nodes Pin
encapsul10-Mar-07 7:53
memberencapsul10-Mar-07 7:53 
QuestionStrings instead of Streams? Pin
Yuvi Panda26-Sep-06 6:15
memberYuvi Panda26-Sep-06 6:15 
AnswerRe: Strings instead of Streams? Pin
Lennard Fonteijn1-Dec-06 13:16
memberLennard Fonteijn1-Dec-06 13:16 
GeneralRe: Strings instead of Streams? Pin
Yuvi Panda1-Dec-06 14:32
memberYuvi Panda1-Dec-06 14:32 
GeneralMove the project elsewhere Pin
KCorax221-Aug-06 9:55
memberKCorax221-Aug-06 9:55 
QuestionCan not fix some error of HTML code . Pin
dinhdv200124-Jul-06 14:36
memberdinhdv200124-Jul-06 14:36 
GeneralSpaces Removed Between Tags Pin
Michael Babb6-Jun-06 9:50
memberMichael Babb6-Jun-06 9:50 
GeneralRe: Spaces Removed Between Tags Pin
Member 577449914-Dec-08 5:40
memberMember 577449914-Dec-08 5:40 
Generala suggestion Pin
XiongHarry14-Apr-06 22:09
memberXiongHarry14-Apr-06 22:09 
GeneralMinor error in parsing &lt;!DOCTYPE ..> Pin
jmas810916-Mar-06 3:27
memberjmas810916-Mar-06 3:27 
GeneralThis rocks. Pin
rhino-x29-Jan-06 11:43
memberrhino-x29-Jan-06 11:43 
GeneralGood job! Pin
Bobga4-Nov-05 11:43
memberBobga4-Nov-05 11:43 
Generalbug with filterindex in vb demo Pin
mgambrell17-Oct-05 13:37
membermgambrell17-Oct-05 13:37 
GeneralI found some bugs. Pin
Nomad Libra7-Sep-05 8:15
sussNomad Libra7-Sep-05 8:15 
GeneralAttribute with blank name and null value Pin
vnguyen91121-Jul-05 12:25
membervnguyen91121-Jul-05 12:25 
Generaleats special characters Pin
Steven A. Lowe21-Mar-05 7:45
memberSteven A. Lowe21-Mar-05 7:45 
GeneralRe: eats special characters Pin
Steven A. Lowe21-Mar-05 9:38
memberSteven A. Lowe21-Mar-05 9:38 
Generalmistreats SSI and server scripts Pin
Steven A. Lowe15-Mar-05 11:21
memberSteven A. Lowe15-Mar-05 11:21 
GeneralRe: mistreats SSI and server scripts Pin
Steven A. Lowe17-Mar-05 5:20
memberSteven A. Lowe17-Mar-05 5:20 
GeneralBug in DOCTYPE Pin
arca_21-Feb-05 8:41
memberarca_21-Feb-05 8:41 
GeneralError when parsing &lt;a href=...&gt; on Pin
marshall-cline15-May-04 9:11
membermarshall-cline15-May-04 9:11 
GeneralRe: Error when parsing &lt;a href=...&gt; on Pin
memberANWAARAHMADMOON13-Jun-05 0:20 
GeneralRe: Error when parsing &lt;a href=...&gt; on Pin
garph030-Jul-07 22:01
membergarph030-Jul-07 22:01 
Questionvb/ version? Pin
Chen Huisheng7-May-04 21:01
memberChen Huisheng7-May-04 21:01 
GeneralSuggestions for changes Pin
giralt20-Apr-04 0:01
membergiralt20-Apr-04 0:01 
GeneralRe: Suggestions for changes - Thanks for the tip Pin
miliu21-Sep-04 6:36
membermiliu21-Sep-04 6:36 
GeneralRe: Suggestions for changes Pin
keremkusmezer3-Mar-05 22:11
memberkeremkusmezer3-Mar-05 22:11 
Generalsevere problem! Pin
chenhuisheng31-Mar-04 18:04
memberchenhuisheng31-Mar-04 18:04 
GeneralThanks Pin
Aaron Eldreth31-Mar-04 3:44
memberAaron Eldreth31-Mar-04 3:44 
GeneralAnother alternative (soon)... Pin
Martin Jericho25-Mar-04 12:43
sussMartin Jericho25-Mar-04 12:43 
GeneralRe: Another alternative (soon)... Pin
Bill Seddon27-Mar-04 21:56
memberBill Seddon27-Mar-04 21:56 
GeneralRe: Another alternative (soon)... Pin
crackajaxx29-Sep-06 11:16
membercrackajaxx29-Sep-06 11:16 
GeneralSweet Work Pin
Jacob Slusser24-Mar-04 14:34
memberJacob Slusser24-Mar-04 14:34 
GeneralRe: Sweet Work Pin
Martin Fuchs27-Mar-04 7:49
memberMartin Fuchs27-Mar-04 7:49 
GeneralRe: Sweet Work Pin
Jacob Slusser27-Mar-04 11:38
memberJacob Slusser27-Mar-04 11:38 
GeneralSgmlReader Pin
Jonathan de Halleux22-Mar-04 3:11
memberJonathan de Halleux22-Mar-04 3:11 
GeneralSuggestions Pin
Stephane Rodriguez.22-Mar-04 2:53
memberStephane Rodriguez.22-Mar-04 2:53 
GeneralRe: Suggestions Pin
GriffonRL22-Mar-04 10:06
memberGriffonRL22-Mar-04 10:06 
GeneralRe: Suggestions Pin
Stephane Rodriguez.22-Mar-04 18:59
memberStephane Rodriguez.22-Mar-04 18:59 
GeneralRe: Suggestions Pin
GriffonRL22-Mar-04 20:46
memberGriffonRL22-Mar-04 20:46 
GeneralRe: Suggestions Pin
Stephane Rodriguez.22-Mar-04 22:13
memberStephane Rodriguez.22-Mar-04 22:13 
GeneralRe: Suggestions Pin
GriffonRL22-Mar-04 22:38
memberGriffonRL22-Mar-04 22:38 
GeneralRe: Suggestions Pin
Stephane Rodriguez.22-Mar-04 23:10
memberStephane Rodriguez.22-Mar-04 23:10 
GeneralRe: Suggestions Pin
Anonymous22-Mar-04 20:49
sussAnonymous22-Mar-04 20:49 
GeneralAlready exists Pin
Rui Dias Lopes22-Mar-04 0:06
memberRui Dias Lopes22-Mar-04 0:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.170813.1 | Last Updated 31 Mar 2004
Article Copyright 2004 by Member 987427
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid