Click here to Skip to main content
12,997,042 members (128,585 online)
Rate this:
Please Sign up or sign in to vote.

I am new to web development. I am working on a project where I have to use a html parser.

I have been using Tagsoup. I have been advised to use parsers that are very similar to the browser.

I know that browsers treat HTML pages differently but, what does "parser similar to browser" mean?

How can I check if a parser is similar to the browser?

Posted 6-Apr-11 1:31am
Updated 6-Apr-11 1:43am
Henry Minute225.7K

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Basically different browsers treat the same markup in slightly (or radically in some cases) different ways. What you are being advised here is to ensure that your browser is capable of creating a Document Object Model (DOM), which is similar to the DOM provided by a browser. If I were you, I'd look to check the ability to produce the DOM in a fashion that is compatible with a standards compliant browser, in other words don't test it against IE6 (I nearly said i.e. don't test... but that would have been too ironic).

What you could do is render the DOM you produce out, and then compare it to the DOM dumped out when you render it in the browser of your choice. To do this, you can dump out the document using this[^] script from the browser.
hervebags 7-Apr-11 7:37am
Thanks. This makes more sense now.

By the way, do you by any chance know where I can get a list of very bad (badly structured) html pages. I would like to test my wrappers on them.


This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web01 | 2.8.170622.1 | Last Updated 6 Apr 2011
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100