Click here to Skip to main content
15,910,358 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi Everyone,

I am going to post this question again as the response for using xml will not work when dealing with html files. I just started messing with HAP and I am having some difficulties in figuring out how to get some of my values.

I am using this file as an example and I am storing the returned values into a listview, problem is I don't know how to go about in getting each value on the section.

HTML
<bookstore>
<book>
   <title lang="en">Harry Potter</title>
   <price>29.99</price>
   <available>In Stock</available>
</book>

<book>
   <title lang="en">Learning XML</title>
   <price>39.95</price>
   <available>In Stock</available>
</book>

<book>
   <title lang="en">Learning C#</title>
   <price>59.95</price>
   <available>Backorder</available>
</book>

<book>
   <title lang="en">Learning Java</title>
   <price>39.95</price>
   <available>In Stock</available>
</book>
</bookstore>

Can someone show me an example on how to traverse the tree and getting each value for each of the books one at a time?

This is all I know how to do right now.

C#
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("sample.txt");

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//title"))
{
    ListViewItem lView = new ListViewItem();
    lView.Text = node.InnerText;
    lView.SubItems.Add("");
    lView.SubItems.Add("");
    listView1.Items.Add(lView);
}


Appreciate any help.
Posted

Hi theadmin,

From the question I understand your requirement is to get the title,price and available for each book under bookstore. Code below does exactly that.

C#
string input =
                "<bookstore><book><title>Harry Potter</title><price>29.99</price><available>In Stock</available></book><book>" +
                "<title>Learning XML</title><price>39.95</price><available>In Stock</available></book><book><title>Learning C#</title>" +
                "<price>59.95</price><available>Backorder</available></book><book><title>Learning Java</title><price>39.95</price>" +
                "<available>In Stock</available></book></bookstore>"

            HtmlDocument html = new HtmlDocument();
            html.LoadHtml(input);

            HtmlNodeCollection bookStore = html.DocumentNode.SelectNodes("//bookstore");
            HtmlNodeCollection books = bookStore[0].SelectNodes("//book");
            foreach (HtmlNode book in books)
            {
                var bookDetail = from child in book.ChildNodes
                    select child.InnerText;
            }


Happy coding.:)
 
Share this answer
 
v3
Comments
theadmin 14-May-15 16:27pm    
Hi Mathi,

I am just using that file as an example since its simple, once I grab a webpage its going to be 1000 times more complex. I chose to use this bit of code as a test in order for me to really understand how it's done.

The code you pasted will return all values under each child, I am trying to figure out how to retrieve each value individually in order to add to a listview.

Listview.Text = Title;
Listview.Subitems.Add(Price);
Listview.Subitems.Add(Availability);

It seems like Agility is my best bet when it comes to parsing html code, I am just trying to learn the program in order to understand what I am doing. I don't know if all of this can be achieved by using what you posted, but I really want to use the agility pack and get a good grip on it..
Hi theadmin,

The webpage can be complex. But I am assuming, you are interested in getting the details of all the <book></book> nodes under <bookstore></bookstore> node from the complex web page. You can achieve what you need with little customization of the code I already shared. Here is how you can do it (assuming you have the Listview).

HtmlNodeCollection bookStore = html.DocumentNode.SelectNodes("//bookstore");
HtmlNodeCollection books = bookStore[0].SelectNodes("//book");
foreach (HtmlNode book in books)
{
    Listview.Text = book.ChildNodes["title"].InnerText;
    Listview.Subitems.Add(book.ChildNodes["price"].InnerText);
    Listview.Subitems.Add(book.ChildNodes["available"].InnerText);
}

Hope this helps.
 
Share this answer
 
Comments
theadmin 14-May-15 16:26pm    
Are you serious? That's it???? Let me give it a shot, if that works then I am set.
theadmin 14-May-15 18:14pm    
Is there another namespace that I need to add besides using System.Windows.Forms;? Somehow I cant even add this first line without the compiler warning me about not knowing what it is.

HtmlDocument html = new HtmlDocument();
Mathi Mani 14-May-15 18:38pm    
Hi, you have to install HtmlAgilityPack from NuGet. Run this command from Package Manager Console to install package.

Install-Package HtmlAgilityPack

Add "using HtmlAgilityPack;" to your class once the installation is completed.
theadmin 14-May-15 22:29pm    
Thanks, I just got it working. As I moved to my more complex html file none of this worked for me anymore since the file was so different. I appreciate all your help.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900