Click here to Skip to main content
15,891,316 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello, I want a simple html analyzer(in c#) which can get the contents of an html element. Let me explain: I want to download a page, get the contents of ".class1 .class2 #id1 div" and then display it to the user. Do you have any leads(besides System.Net.WebClient)?

P.S. So far I have found HTML agility pack which uses an xPath to get an element.
Posted
Updated 30-Aug-12 22:35pm
v3

Hi,

I think jQuery can get all the information from your given html control. you can get InnerHtml content from particular Div/Table through the ID/it's Associated Class.

Suppose you have HTML content as ,
XML
<div class="demo-container">
  <div class="demo-box">Demonstration Box</div>
</div>


Then you can extract Inner Div using,
JavaScript
$('div.demo-container').html();

And Result would be
HTML
<div class="demo-box">Demonstration Box</div>

(Above code is taken from jQuery[^])

But this can be possible if you already have hierarchy information to navigate in the html.

Hope i answered your query,
Thanks
-Amit Gajjar.
 
Share this answer
 
Comments
mostwanted4 31-Aug-12 4:28am    
Thanks, this is the exact thing I want to do, but in c#(I forgot to mention it)
AmitGajjar 31-Aug-12 4:29am    
if you have web application then you can get using jQuery, update hidden field with this value, and get it from C# :)
mostwanted4 31-Aug-12 4:34am    
I don't have a web application I have a simple Windows Form in which I download the page as a string(using WebClient) and then I process it. It would be awesome to run jQuery in these conditions.
AmitGajjar 31-Aug-12 4:36am    
check http://jint.codeplex.com/
mostwanted4 31-Aug-12 4:38am    
Cool! Thanks ;)
You can use regex to parse a file if you know the tags you're looking for. XmlDocument also works, if its XHTML.
 
Share this answer
 
Use LINQ to XML to achieve this.
For example:
C#
string htmlString = @"<html>
                        <body>
                          <p>hi</p>
                          <table>...</table>
                        </body>
                     </html>";

XDocument htmlDoc = XDocument.Parse(htmlString);

Now htmlDoc contains the DOM elements as XNode.
XNode (html tag) --> XNode (body tag) --> XNode(p tag), XNode(table tag)
 
Share this answer
 
Comments
mostwanted4 31-Aug-12 4:43am    
Nice, but I'm afraid I'm dealing with HTML not XHTML
pramod.hegde 31-Aug-12 4:58am    
This still works with HTML.
mostwanted4 31-Aug-12 9:11am    
I tried it and it throws errors for tags like <link type="" rel="" href=""> where the tag is never closed.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900