Click here to Skip to main content
15,880,956 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
Hello,

I am new to web development. I am working on a project where I have to use a html parser.

I have been using Tagsoup. I have been advised to use parsers that are very similar to the browser.

I know that browsers treat HTML pages differently but, what does "parser similar to browser" mean?

How can I check if a parser is similar to the browser?

Thanks
Posted
Updated 6-Apr-11 1:43am
v2

1 solution

Basically different browsers treat the same markup in slightly (or radically in some cases) different ways. What you are being advised here is to ensure that your browser is capable of creating a Document Object Model (DOM), which is similar to the DOM provided by a browser. If I were you, I'd look to check the ability to produce the DOM in a fashion that is compatible with a standards compliant browser, in other words don't test it against IE6 (I nearly said i.e. don't test... but that would have been too ironic).

What you could do is render the DOM you produce out, and then compare it to the DOM dumped out when you render it in the browser of your choice. To do this, you can dump out the document using this[^] script from the browser.
 
Share this answer
 
Comments
The_Real_Chubaka 7-Apr-11 7:37am    
Thanks. This makes more sense now.

By the way, do you by any chance know where I can get a list of very bad (badly structured) html pages. I would like to test my wrappers on them.

Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900