5,691,626 members and growing! (13,810 online)
Email Password   helpLost your password?
Platforms, Frameworks & Libraries » .NET Framework » General     Intermediate

MIL HTML Parser

By Member 987427

A non-well-formed HTML parser for .NET
C#, VB, Windows, .NET 1.0, .NETVisual Studio, VS.NET2002, Dev

Posted: 21 Mar 2004
Updated: 30 Mar 2004
Views: 145,421
Bookmarked: 100 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
58 votes for this Article.
Popularity: 8.25 Rating: 4.68 out of 5
2 votes, 3.4%
1
1 vote, 1.7%
2
0 votes, 0.0%
3
7 votes, 12.1%
4
48 votes, 82.8%
5

Introduction

This library produces a domain tree of a given HTML document, allowing the developer to navigate and change the document in an methodical way. In addition to the basic HTML production, this library can also be used to produce XHTML documents, as it includes an HTML 4 entity encoder. Included in this release is a demonstration application in VB.NET showing how to use the library. I hope that it is all fairly self-explanatory.

Background

This library was written to avoid having to convert a document into XML prior to reading, whilst preserving the distinct HTML qualities. This gets round some deployment issues I had with different platforms.

Using the code

The simplest way to use the code is to add it into your solution as a C# class library. There are no third-party dependencies so it is just a matter of adding the source files in. Alternatively, you can build the DLL and add it as a reference.

Points of Interest

The XHTML production is fairly basic - there is no built-in DTD checking. So far, I have had no problems in the generation, but I'm keen on getting that sorted.

History

  • 1.4
  • 1.3
    • Bugfix: <!DOCTYPE...> and <!...> now treated as comments
    • Bugfix: Malformed or incomplete attribute values causing infinite loop fixed
  • 1.2
    • Bugfix: <tag/> now handled properly
    • Bugfix: Parse errors of scripts
    • Bugfix: Parse errors of styles
    • HTML 4 entity encoding
    • DOM tree navigation
    • Basic node searching
    • HTML production
    • XHTML production (as per http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd)
    • Added some component model stuff & comments
    • Hid the parser
  • 1.1
    • Initial release

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Member 987427



Location: United Kingdom United Kingdom

Other popular .NET Framework articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 25 of 57 (Total in Forum: 57) (Refresh)FirstPrevNext
GeneralLowercased hrefmemberexxellence2:04 12 Nov '08  
Generalfeature missingmemberzeltera5:34 17 Aug '08  
GeneralRe: feature missingmembersmitsc9:11 7 Oct '08  
GeneralDOCTYPE breaks the parsermemberbenblo6:18 14 May '08  
QuestionIs it a bug?memberhuyhk21:57 27 Feb '08  
GeneralRe: Is it a bug?memberNatural Cause23:34 26 Mar '08  
GeneralSuggestions for new interface methodsmemberBerend Engelbrecht10:10 26 Feb '08  
GeneralFound a bugmemberstavinski11:09 14 Jan '08  
AnswerRe: Found a bug - me too, and the solutionmemberBerend Engelbrecht22:04 25 Feb '08  
QuestionHTML markedmembermaingaosuong17:19 25 Sep '07  
Generalregarding extracting tagsmemberrama jayapal4:49 29 Mar '07  
GeneralRe: regarding extracting tagsmemberJames S.F. Hsieh23:09 29 Mar '07  
GeneralYou can try it. :)memberJames S.F. Hsieh23:36 27 Mar '07  
Generalmisses some IMG nodesmemberencapsul8:53 10 Mar '07  
QuestionStrings instead of Streams?memberYuvi Panda7:15 26 Sep '06  
AnswerRe: Strings instead of Streams?memberLennard Fonteijn14:16 1 Dec '06  
GeneralRe: Strings instead of Streams?memberYuvi Panda15:32 1 Dec '06  
GeneralMove the project elsewherememberKCorax210:55 21 Aug '06  
GeneralCan not fix some error of HTML code .memberdinhdv200115:36 24 Jul '06  
GeneralSpaces Removed Between TagsmemberMichael Babb10:50 6 Jun '06  
Generala suggestionmemberXiongHarry23:09 14 Apr '06  
GeneralMinor error in parsing <!DOCTYPE ..>memberjmas81094:27 16 Mar '06  
GeneralThis rocks.memberrhino-x12:43 29 Jan '06  
GeneralGood job!memberBobga12:43 4 Nov '05  
Generalbug with filterindex in vb demomembermgambrell14:37 17 Oct '05  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 30 Mar 2004
Editor: Nishant Sivakumar
Copyright 2004 by Member 987427
Everything else Copyright © CodeProject, 1999-2008
Web10 | Advertise on the Code Project