The Lounge is rated PG. If you're about to post something you wouldn't want your
kid sister to read then don't post it. No flame wars, no abusive conduct, no programming
questions and please don't post ads.
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
I hope you're taking the phone interview with #3 to explain to them that you aren't interested in the position, that you tried to explain that to the agent a half dozen times before he/she setup the interview, but it just didn't seem to be sinking in.
</div> is an invalid self-closing tag and is viewed as a new tag
<img></img> is seen as a stray ending tag,
And all the while, it sort of looks like XML.
1. I can see now why there was a push for XHTML
2. Learning the details of this makes me loathe HTML even more
3. The W3C people are....wait for it...IDIOTS
Well - at least for the example you've submitted I don't see your squawk:
<div/> is most likely an error, and at best, pointless (<br> works better and is clearer and has no closing '/'). How would XHTML handle this better? One's opinion enters in upon what better means.
<img ...> has no end tag - but this makes sense: one is not supposed to have any content potential between tags of the image type (were they allowed). One sure way of preventing this is to flag the closing tag as invalid.
Now - eliminating the internal closing tag for (i.e., no <img ... /> - I can see a good argument for that as the closing tag is a good flag that this is, indeed, the end, of a self closing element.
Remember . . . above all . . . it's for the internet. Is it really worth being any more rigorous when you consider what will be done with it?
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein
"As far as we know, our computer has never had an undetected error." - Weisert
"If you are searching for perfection in others, then you seek disappointment. If you are seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010
The tag <script src="myscript.js" /> also looks pretty valid to me. But it isn't.
You have to write the tag with a seperate closing tag <script src="myscript.js"></script> even if the content is empty...
Consistency is obviously an unknown word in HTML...
The ones you mention are just the tip of the iceberg. Unfortunately HTML5 is great for this - since it unifies the browser's error handling. Before HTML5, errors have been handled by all browsers differently.
It will be hard to fix them all, as most HTML rules depend on the context. If you think parsing HTML is easy - it is not. There are crazy rules - especially for tables. And don't get even started on foreign elements...
Additionally you just mentioned <img>, but the specification also explicitly provides information on <image> - which is crazy. There is a huge number of other edge cases, but I think the foster parenting + formatting reconstruction are among the hardest.
Generating HTML is still a lot easier than parsing it.
I know that this is a really old thread, and it's possible that you may have already finished your DSL, but you should know that you've been reinventing the wheel here.