Click here to Skip to main content
11,706,455 members (33,319 online)
Rate this: bad
Please Sign up or sign in to vote.

I am new to web development. I am working on a project where I have to use a html parser.

I have been using Tagsoup. I have been advised to use parsers that are very similar to the browser.

I know that browsers treat HTML pages differently but, what does "parser similar to browser" mean?

How can I check if a parser is similar to the browser?

Posted 6-Apr-11 1:31am
Edited 6-Apr-11 1:43am
Henry Minute224.6K

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Basically different browsers treat the same markup in slightly (or radically in some cases) different ways. What you are being advised here is to ensure that your browser is capable of creating a Document Object Model (DOM), which is similar to the DOM provided by a browser. If I were you, I'd look to check the ability to produce the DOM in a fashion that is compatible with a standards compliant browser, in other words don't test it against IE6 (I nearly said i.e. don't test... but that would have been too ironic).

What you could do is render the DOM you produce out, and then compare it to the DOM dumped out when you render it in the browser of your choice. To do this, you can dump out the document using this[^] script from the browser.
hervebags at 7-Apr-11 7:37am
Thanks. This makes more sense now.

By the way, do you by any chance know where I can get a list of very bad (badly structured) html pages. I would like to test my wrappers on them.


This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 197
1 Sergey Alexandrovich Kryukov 141
2 chainerlt 110
3 Mika Wendelius 80
4 ppolymorphe 71
0 OriginalGriff 9,070
1 Sergey Alexandrovich Kryukov 8,413
2 CPallini 5,189
3 Maciej Los 4,746
4 Mika Wendelius 3,676

Advertise | Privacy | Mobile
Web02 | 2.8.150819.1 | Last Updated 6 Apr 2011
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100