Click here to Skip to main content
15,891,248 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
Hi, I seem to be having trouble using the code of some websites for example the page source at

view-source:http://www.booksamillion.com/search?id=5910205702379&query=hunger+games&where=book_title&search.x=24&search.y=9&search=Search&affiliate=&sort=price_ascending

I am trying to get the Link of the book as well as the price from this:

XML
<div class="meta">


        <span class="title"><a href="http://www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379" title="The Hunger Games Sparknotes Literature Guide"><img src="http://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg" width="60" alt="The Hunger Games Sparknotes Literature Guide">The Hunger Games Sparknotes Literature Guide</a> (Paperback)</span>

        <span class="byline">by <a href="search?type=author&query=SparkNotes&id=5910205702379" title="SparkNotes">SparkNotes</a>, <a href="search?type=author&query=Suzanne Collins&id=5910205702379" title="Suzanne Collins">Suzanne Collins</a>

        <br>ISBN 9781411470989 / February 2014</span>
        <br><br>
<span class="ebook-price">Online Price: $5.95</span>

<span class="ebook-price">Marketplace Price from: $6.39</span>

        <div class="availability_search_results">In Stock.</div>
    </div><!-- end meta -->


This is what I have as code at the moment:

C#
string getPrice = string.Empty;
       string getUrl = string.Empty;
       HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
       htmlDoc.OptionFixNestedTags = true;
       htmlDoc.LoadHtml(responseData); // load html
       HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
       HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");


       foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
       {
           getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
           HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");

           foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
           {
               getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
           }
       }


It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?

Thanks a bunch!
Posted
Updated 6-Feb-14 15:40pm
v2

1 solution

I have no idea what are you trying to do to get a source of HTML document. All you need is to download it as is, without any rendering or anything like that. You can use either the class System.Net.WebClient or, even better, System.Net.HttpWebRequest:
http://msdn.microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx[^].

—SA
 
Share this answer
 
Comments
slayasty 9-Feb-14 13:01pm    
Thanks for the help, managed to solve it by getting the xPath properly
Sergey Alexandrovich Kryukov 9-Feb-14 14:01pm    
You are welcome.
—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900