First, you need to parse HTML to recognize relationships between the elements
<table>
,
<tr>
and
<td>
and extract the HTML content from
<td>
. If you can assume your HTML is a well-formed XML, this is easy; you can use one of available XML parsers.
Here is a short review of XML parsing capabilities you can use:
- Use
System.Xml.XmlDocument
class. It implements DOM interface; this way is the easiest and good enough if the size if the document is not too big.
See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^]. - Use the class
System.Xml.XmlTextReader
; this is the fastest way of reading, especially is you need to skip some data.
See http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx[^]. - Use the class
System.Xml.Linq.XDocument
; this is the most adequate way similar to that of XmlDocument
, supporting LINQ to XML Programming.
See http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx[^], http://msdn.microsoft.com/en-us/library/bb387063.aspx[^].
If you cannot assume well-formed XML on input, this is of course more difficult. You can use some HTML parser which can handle such input. Try, for example, this one:
http://www.majestic12.co.uk/projects/html_parser.php[
^].
To create Excel document you need to use Microsoft Office interop. Start here:
http://msdn.microsoft.com/en-us/library/wss56bz7%28v=VS.100%29.aspx[
^],
http://msdn.microsoft.com/en-us/library/dd264733.aspx[
^].
—SA