The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
It turned out that by using a System.Windows.Forms.WebBrowser and retrieving the Document, I was actually getting a sanitized version of the HTML (and only the body). This is a problem, because when common sense breaks out, there will be no certificate error, no reason to use the System.Windows.Forms.WebBrowser, and I expect that I will then receive the entire nasty pile of HTML in its raw form (fingers crossed).
So, this week I looked into accessing the raw HTML from the System.Windows.Forms.WebBrowser ... I accessed its privates, and grabbed it by the primary Interop assembly. And, by gum, it worked.
"What did you find?" I hear you ask. It's more what I didn't find. The page contains most of a TABLE (as expected), but a few start tags are missing -- unimportant ones, like THEAD, TR, and TH.
Can you then fault me for summoning Cthulhu? What self-respecting HTML parser will deal with such a mess? (Other than IE, of course).
(Deep breath.) I spent today wrestling with HtmlAgilityPack, which dealt pretty well with the errors (TagNotOpened) and I managed to use the errors to insert the missing start tags where they logically belong. Nifty. Perfect effort for the last day before a week off.
So, provided I can deploy HtmlAgilityPack to the server, I may be able to cancel my summons. In the meantime, I have the RegEx version on the server.
Now, at the risk of asking a Programming Question... does anyone know how to get HtmlAgilityPack to report TagNotClosed errors as well? It has an error type for it, but I haven't gotten it to report any.