 When answering a question please: Read the question carefully Understand that English isn't everyone's first language so be lenient of bad spelling and grammar If a question is poorly phrased then either ask for clarification, ignore it, or mark it down. Insults are not welcome If the question is inappropriate then click the 'vote to remove message' button Insults, slap-downs and sarcasm aren't welcome. Let's work to help developers, not make them feel stupid.
Be courteous and DON'T SHOUT. Everyone here helps because they enjoy helping others, not because it's their job. Please do not post links to your question into an unrelated forum such as the lounge. It will be deleted. Likewise, do not post the same question in more than one forum. Do not be abusive, offensive, inappropriate or harass anyone on the boards. Doing so will get you kicked off and banned. Play nice. If you have a school or university assignment, assume that your teacher or lecturer is also reading these forums. No advertising or soliciting. We reserve the right to move your posts to a more appropriate forum or to delete anything deemed inappropriate or illegal.
 Parsing HTML PIEBALDconsult 18 Apr '13 - 5:15
 OK, I'd be first to post a link to Parsing Html The Cthulhu Way [^] if anyone suggests Regular Expressions, but I have a problem using an XmlDocument (and therefore XPath) with an HTML file I'm downloading.   The page is a list of files to download -- I need to extract the hrefs from the as, obviously I'd prefer to use XPath to do that.   0) The file doesn't contain an opening  tag (it does have a closing  tag ) -- I can tack one on, that's not a big deal. 1) It contains at least one   entity (and possibly other entities) and the XmlDocument doesn't like that.   So I need options, people!   I can summon Cthulhu. I can use Regular Expressions to replace any offending entities and then feed the result to an XmlDocument.   What other options might there be? Sign In·View Thread·Permalink
 Re: Parsing HTML Richard Deeming 18 Apr '13 - 5:56
 HTML != XML   Use the HTML Agility Pack[^] instead. "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer Sign In·View Thread·Permalink
 Re: Parsing HTML PIEBALDconsult 18 Apr '13 - 16:18
 Ah, sooooo... let the summoning begin!   Oh, mighty Cthulhu! Wise and terrible! I ask your assistance as my days have been blighted with some gnarly HTML! Please, oh lord, come smite the bare buttocks of the wretch who hast wrought this travesty. I will repay you with a pint of bitter. Not a measly USian pint mind you, but a proper Britsh pint. Sign In·View Thread·Permalink
 Re: Parsing HTML Richard Deeming 19 Apr '13 - 1:46
 No need to make that call to R'lyeh yet; the HAP makes parsing an HTML document simple: HtmlDocument doc = new HtmlDocument(); doc.Load(@"path\to\your\file.htm");   foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"]) { string url = link["href"].Value; Fhtagn(url); }  "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer Sign In·View Thread·Permalink
 XML data representation CsTreval 5 Apr '13 - 0:39
 If I take data (1,2,3,a,b,c,..) and I put it in XML form (, < b >, )does that mean that data becomes information because of XML? Does XML turn data into information by representing it in tags?   I am also reading here in a slideshow: "In JDOM, every XML tree is approached as a document even though the content has nothing to do with documents". I looked up the definition of 'document' on dictionary.com and it states that a document is meant as being informative. 'informative' means 'to convey information'. Then, if the purpose of XML is to represent data into information, why does the content of an XML tree supposedly not have anything to do with a document and therefore nothing to do with information? This is confusing. Perhaps the author of that slideshow was using different semantics than I have in my mind right now.   Any thoughts on this? Sign In·View Thread·Permalink
 Re: XML data representation Kenneth Haugland 5 Apr '13 - 2:13
 I actually view XML documents as a datatable, and in fact you can store a datatable to XML format by the ToXML command. Sign In·View Thread·Permalink
