I have been trying to scrape some data off a website. The source has differentiated all the headers of tables to that of the actual contents by different class names. Because I want to scrape all the table information, I got all the headers into one array and contents into another array. But the problem is that when I am trying to write the array contents into a file, I can write a header but second array contains contents from all the table and I cannot mark where contents of first table ends. Because htmlagilitypack scrapes all the tags of specified Nodes, I get all the contents. First let me show the code to make it clear:
<tr class=tableHeader>
<th width=16%>Caught</th>
<th width=16%><p><a href="/url">Normal Range</a></p></th>
</tr>
<TR class=content><TD><a href="/url">Bluegill</a></TD>
<TD>trap net</TD>
<TD align=CENTER>4.05</TD>
<TD align=CENTER> 7.9 - 37.7</TD>
<TD align=CENTER>0.26</TD>
<TD align=CENTER> 0.1 - 0.2</TD>
</TR>
<TR class=content><TD></TD>
<TD>Gill net</TD>
<TD align=CENTER>1.50</TD>
<TD align=CENTER>N/A</TD>
<TD align=CENTER>0.07</TD>
<TD align=CENTER>N/A</TD>
</TR>
<tr class=tableHeader>
<th>0-5</th>
<th>6-8</th>
<th>9-11</th>
<th>12-14</th>
<th>15-19</th>
<th>20-24</th>
<th>25-29</th>
<th>30+</th>
<th>Total</th>
</tr>
<TR class=content><TD>bluegill</TD>
<TD align=CENTER>19</TD>
<TD align=CENTER>65</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>0</TD>
<TD align=CENTER>84</TD>
</TR>
Below is my code to save the headers and contents into array and try to display it exactly like in the website.
int count =0;
foreach (var trTag4Pale in trTags4Pale)
{
string trText4Pale = trTag4Pale.InnerText;
paleLake[count] = trText4Pale;
if (trTags4Small != null)
{
int counter = 0;
foreach (var trTag4Small in trTags4Small)
{
string trText4Small = trTag4Small.InnerText;
smallText[counter] = trText4Small;
counter++;
}
}
File.AppendAllText(path,paleLake[count]+Environment.Newline+smallText[count]+Environment.Newline);
}
As you see, When I try to append the contents of the array to a file, it lines in the first header, and contents of all the table. But I only want contents of the first table and would repeat the process to get the content of the second table and so forth. If I could get the contents between tr tag tableHeader, the arrays for the content would contain every contents for all the tables in different arrays. I don't know how to do this.