Click here to Skip to main content
15,903,856 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to get first-level elements via parsing HTML file with HTML Agility Pack ,for example result will be like this:

<html>
<body>

<div class="header">....</div>
<div class="main">.....</div>
<div class="right">...</div>
<div class="left">....</div>
<div class="footer">...</div>

</body>
</html>

That each <div> is contains other tag...

can anyone help me?

thanks...
Posted
Comments
CodeBlack 19-Aug-13 2:32am    
can you please explain more about the exact output ?
hodash 19-Aug-13 3:48am    
see...I want to extract all text that exist in a website,but separately . for example right side separate,left side separate , footer and so...

Excuse my English language is not good...
CodeBlack 19-Aug-13 4:06am    
and you want it by javascript or c# code ?
hodash 19-Aug-13 4:12am    
C# code

1 solution

I dont have any idea about HTML Agility Pack.
But i can do same thing using Regular Expressions. See my below example, which gets div tags which are having class="header". Same can be done for other classes as well :

C#
string htmlText = "<div class=\"header\">This is Header one</div>"
          + "<div class=\"header\">This is Header two</div>"
                      + "<div class=\"header\">This is Header three</div>"
                      + "<div class=\"main\">.....</div>"
                      + "<div class=\"right\">...</div>"
                      + "<div class=\"left\">....</div>"
                      + "<div class=\"footer\">...</div>";

var regex = new Regex(@"<div class=""header"">(.*?)</div>");
MatchCollection mc = regex.Matches(htmlText);
List<string> headers = new List<string>();

foreach (Match collections in mc)
{
    headers.Add(collections.Groups[0].Value);
}
 
Share this answer
 
v4
Comments
hodash 19-Aug-13 4:53am    
oh no... I do not want to use regex,as I want a way to respond to all situations,all website...
Again, thanks for taking time...
Thomas Daniels 19-Aug-13 4:57am    
You can't parse HTML with regex, because HTML isn't a regular language. If you want to parse HTML, it's a better idea to use a HTML parser such as HtmlAgilityPack.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900