Getting the Page Source of website

Question

1.00/5 (1 vote)

See more:

Hi, I seem to be having trouble using the code of some websites for example the page source at

view-source:http://www.booksamillion.com/search?id=5910205702379&query=hunger+games&where=book_title&search.x=24&search.y=9&search=Search&affiliate=&sort=price_ascending

I am trying to get the Link of the book as well as the price from this:

XML

<div class="meta">


        <span class="title"><a href="http://www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379" title="The Hunger Games Sparknotes Literature Guide"><img src="http://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg" width="60" alt="The Hunger Games Sparknotes Literature Guide">The Hunger Games Sparknotes Literature Guide</a> (Paperback)</span>

        <span class="byline">by <a href="search?type=author&query=SparkNotes&id=5910205702379" title="SparkNotes">SparkNotes</a>, <a href="search?type=author&query=Suzanne Collins&id=5910205702379" title="Suzanne Collins">Suzanne Collins</a>

        <br>ISBN 9781411470989 / February 2014</span>
        <br><br>
<span class="ebook-price">Online Price: $5.95</span>

<span class="ebook-price">Marketplace Price from: $6.39</span>

        <div class="availability_search_results">In Stock.</div>
    </div><!-- end meta -->

This is what I have as code at the moment:

C#

string getPrice = string.Empty;
       string getUrl = string.Empty;
       HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
       htmlDoc.OptionFixNestedTags = true;
       htmlDoc.LoadHtml(responseData); // load html
       HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
       HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");


       foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
       {
           getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
           HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");

           foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
           {
               getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
           }
       }

It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?

Thanks a bunch!

Posted 6-Feb-14 7:27am

slayasty

Updated 6-Feb-14 15:40pm

joginder-banger

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2014-02-06T15:20:00

Solution 1

I have no idea what are you trying to do to get a source of HTML document. All you need is to download it as is, without any rendering or anything like that. You can use either the class System.Net.WebClient or, even better, System.Net.HttpWebRequest:
http://msdn.microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx[^].

—SA

Posted 6-Feb-14 15:20pm

Sergey Alexandrovich Kryukov

Comments

slayasty 9-Feb-14 13:01pm

Thanks for the help, managed to solve it by getting the xPath properly

Sergey Alexandrovich Kryukov 9-Feb-14 14:01pm

You are welcome.
—SA