How to get first-level elements from HTML file with HTML Agility Pack & c#

Question

0.00/5 (No votes)

See more:

I want to get first-level elements via parsing HTML file with HTML Agility Pack ,for example result will be like this:

<html>
<body>

<div class="header">....</div>
<div class="main">.....</div>
<div class="right">...</div>
<div class="left">....</div>
<div class="footer">...</div>

</body>
</html>

That each <div> is contains other tag...

can anyone help me?

thanks...

Posted 18-Aug-13 6:35am

hodash

Add a Solution

Comments

CodeBlack 19-Aug-13 2:32am

can you please explain more about the exact output ?

hodash 19-Aug-13 3:48am

see...I want to extract all text that exist in a website,but separately . for example right side separate,left side separate , footer and so...

Excuse my English language is not good...

CodeBlack 19-Aug-13 4:06am

and you want it by javascript or c# code ?

hodash 19-Aug-13 4:12am

C# code

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeBlack · Answer 1 · 2013-08-18T22:37:00

I dont have any idea about HTML Agility Pack.
But i can do same thing using Regular Expressions. See my below example, which gets div tags which are having class="header". Same can be done for other classes as well :

C#

string htmlText = "<div class=\"header\">This is Header one</div>"
          + "<div class=\"header\">This is Header two</div>"
                      + "<div class=\"header\">This is Header three</div>"
                      + "<div class=\"main\">.....</div>"
                      + "<div class=\"right\">...</div>"
                      + "<div class=\"left\">....</div>"
                      + "<div class=\"footer\">...</div>";

var regex = new Regex(@"<div class=""header"">(.*?)</div>");
MatchCollection mc = regex.Matches(htmlText);
List<string> headers = new List<string>();

foreach (Match collections in mc)
{
    headers.Add(collections.Groups[0].Value);
}