Click here to Skip to main content
Click here to Skip to main content

MIL HTML Parser

By , 30 Mar 2004
 

Introduction

This library produces a domain tree of a given HTML document, allowing the developer to navigate and change the document in an methodical way. In addition to the basic HTML production, this library can also be used to produce XHTML documents, as it includes an HTML 4 entity encoder. Included in this release is a demonstration application in VB.NET showing how to use the library. I hope that it is all fairly self-explanatory.

Background

This library was written to avoid having to convert a document into XML prior to reading, whilst preserving the distinct HTML qualities. This gets round some deployment issues I had with different platforms.

Using the code

The simplest way to use the code is to add it into your solution as a C# class library. There are no third-party dependencies so it is just a matter of adding the source files in. Alternatively, you can build the DLL and add it as a reference.

Points of Interest

The XHTML production is fairly basic - there is no built-in DTD checking. So far, I have had no problems in the generation, but I'm keen on getting that sorted.

History

  • 1.4
  • 1.3
    • Bugfix: <!DOCTYPE...> and <!...> now treated as comments
    • Bugfix: Malformed or incomplete attribute values causing infinite loop fixed
  • 1.2
    • Bugfix: <tag/> now handled properly
    • Bugfix: Parse errors of scripts
    • Bugfix: Parse errors of styles
    • HTML 4 entity encoding
    • DOM tree navigation
    • Basic node searching
    • HTML production
    • XHTML production (as per http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd)
    • Added some component model stuff & comments
    • Hid the parser
  • 1.1
    • Initial release

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Member 987427
United Kingdom United Kingdom
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionWhitespace between text and anchormemberMike11424 Oct '12 - 14:59 
BugBug fix for pages with different encodingmemberMember 846475913 Dec '11 - 5:40 
BugBug fix in returning multi empty spaces between wordsmemberMember 846475912 Dec '11 - 4:31 
BugError : Input string was not in a correct format.memberMember 846475912 Dec '11 - 0:51 
Questionthank youmemberMember 84647596 Dec '11 - 6:09 
GeneralMy vote of 5memberjp73125 Nov '10 - 0:45 
NewsWorks very good for Google!memberjp73125 Nov '10 - 0:45 
GeneralDoes not remove whitespacesmemberevald8018 Jan '10 - 0:25 
GeneralGood man goodmemberniks0412 Jan '10 - 18:31 
QuestionCan I get MIL HTML parser Algorithm.memberHasibul Haque26 May '09 - 9:21 
Generalcongratulationsmembervukovicg13 May '09 - 4:03 
GeneralSimply amazing!memberthe Asocial Ape13 May '09 - 3:57 
GeneralLowercased hrefmemberexxellence12 Nov '08 - 1:04 
Generalfeature missingmemberzeltera17 Aug '08 - 4:34 
GeneralRe: feature missingmembersmitsc7 Oct '08 - 8:11 
GeneralDOCTYPE breaks the parsermemberbenblo14 May '08 - 5:18 
QuestionIs it a bug?memberhuyhk27 Feb '08 - 20:57 
GeneralRe: Is it a bug?memberNatural Cause26 Mar '08 - 22:34 
GeneralRe: Is it a bug?memberNoodleNoggin981 Jul '09 - 10:03 
GeneralRe: Is it a bug? [modified]memberMember 458246616 Jul '09 - 12:21 
GeneralRe: Is it a bug?memberJeremy Falcon8 Jul '09 - 4:50 
GeneralSuggestions for new interface methodsmemberBerend Engelbrecht26 Feb '08 - 9:10 
GeneralFound a bugmemberstavinski14 Jan '08 - 10:09 
AnswerRe: Found a bug - me too, and the solutionmemberBerend Engelbrecht25 Feb '08 - 21:04 
QuestionHTML markedmembermaingaosuong25 Sep '07 - 16:19 
Generalregarding extracting tagsmemberrama jayapal29 Mar '07 - 3:49 
GeneralRe: regarding extracting tagsmemberJames S.F. Hsieh29 Mar '07 - 22:09 
GeneralYou can try it. :)memberJames S.F. Hsieh27 Mar '07 - 22:36 
Generalmisses some IMG nodesmemberencapsul10 Mar '07 - 7:53 
QuestionStrings instead of Streams?memberYuvi Panda26 Sep '06 - 6:15 
AnswerRe: Strings instead of Streams?memberLennard Fonteijn1 Dec '06 - 13:16 
GeneralRe: Strings instead of Streams?memberYuvi Panda1 Dec '06 - 14:32 
GeneralMove the project elsewherememberKCorax221 Aug '06 - 9:55 
QuestionCan not fix some error of HTML code .memberdinhdv200124 Jul '06 - 14:36 
GeneralSpaces Removed Between TagsmemberMichael Babb6 Jun '06 - 9:50 
GeneralRe: Spaces Removed Between TagsmemberMember 577449914 Dec '08 - 5:40 
Generala suggestionmemberXiongHarry14 Apr '06 - 22:09 
GeneralMinor error in parsing &lt;!DOCTYPE ..>memberjmas810916 Mar '06 - 3:27 
GeneralThis rocks.memberrhino-x29 Jan '06 - 11:43 
GeneralGood job!memberBobga4 Nov '05 - 11:43 
Generalbug with filterindex in vb demomembermgambrell17 Oct '05 - 13:37 
GeneralI found some bugs.sussNomad Libra7 Sep '05 - 8:15 
GeneralAttribute with blank name and null valuemembervnguyen91121 Jul '05 - 12:25 
Generaleats special charactersmemberSteven A. Lowe21 Mar '05 - 7:45 
GeneralRe: eats special charactersmemberSteven A. Lowe21 Mar '05 - 9:38 
Generalmistreats SSI and server scriptsmemberSteven A. Lowe15 Mar '05 - 11:21 
GeneralRe: mistreats SSI and server scriptsmemberSteven A. Lowe17 Mar '05 - 5:20 
GeneralBug in DOCTYPEmemberarca_21 Feb '05 - 8:41 
GeneralError when parsing &lt;a href=...&gt; on google.commembermarshall-cline15 May '04 - 9:11 
GeneralRe: Error when parsing &lt;a href=...&gt; on google.commemberANWAARAHMADMOON13 Jun '05 - 0:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 31 Mar 2004
Article Copyright 2004 by Member 987427
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid