Well, strictly saying, an html file is not an xml at all. It's may look very similar to xml due to usage of tags, but it isn't. See:
XML - Wikipedia[
^]
If you want to convert html file into xml file, i'd suggest to use
SgmlReader[
^]. SgmlReader is a .NET library that is handy for converting SGML content (like HTML and OFX) into well formed XML via XmlReader, XmlDocument, XDocument or XPathDocument. It runs on Windows and Linux using Mono.
If you want to get only data between tags, you have to create a "html parser". For suggestion, please see:
Google[
^]