Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
Hello,
 
I am new to web development. I am working on a project where I have to use a html parser.
 
I have been using Tagsoup. I have been advised to use parsers that are very similar to the browser.
 
I know that browsers treat HTML pages differently but, what does "parser similar to browser" mean?
 
How can I check if a parser is similar to the browser?
 
Thanks
Posted 6-Apr-11 2:31am
hervebags1.1K
Edited 6-Apr-11 2:43am
Henry Minute223.9K
v2

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Basically different browsers treat the same markup in slightly (or radically in some cases) different ways. What you are being advised here is to ensure that your browser is capable of creating a Document Object Model (DOM), which is similar to the DOM provided by a browser. If I were you, I'd look to check the ability to produce the DOM in a fashion that is compatible with a standards compliant browser, in other words don't test it against IE6 (I nearly said i.e. don't test... but that would have been too ironic).
 
What you could do is render the DOM you produce out, and then compare it to the DOM dumped out when you render it in the browser of your choice. To do this, you can dump out the document using this[^] script from the browser.
  Permalink  
Comments
hervebags at 7-Apr-11 7:37am
   
Thanks. This makes more sense now.
 
By the way, do you by any chance know where I can get a list of very bad (badly structured) html pages. I would like to test my wrappers on them.
 
Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 325
1 Sergey Alexandrovich Kryukov 289
2 CPallini 275
3 DamithSL 260
4 Maciej Los 215
0 OriginalGriff 5,455
1 DamithSL 4,422
2 Maciej Los 3,860
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,010


Advertise | Privacy | Mobile
Web03 | 2.8.141216.1 | Last Updated 6 Apr 2011
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100