Click here to Skip to main content
15,883,705 members
Please Sign up or sign in to vote.
1.80/5 (2 votes)
See more:
Hi to all,
I am new in asp.net applications. Actually I want to scrape some details from a web site.I can do this by vba in MS Excel. But, unfortunately my Internet Explorer browser is not working properly.

Hence, I have decided to do web scraping by asp.net web application.
I want to scrap details from following html code of a website.

HTML
<div class="phone-number">
  (310) 703-7939
</div>


Here, i want to get the phone number given i.e. (310) 703-7939

I have used following code to get this.

C#
protected void btnInnerText_Click(object sender, EventArgs e)
       {


           var document = new HtmlDocument();
           document.OptionReadEncoding = false;

           var url =
              new Uri("https://weedmaps.com/dispensaries/california/lax/true-healing-center?c=dispensaries");
           var request = (HttpWebRequest)WebRequest.Create(url);
           request.Method = "GET";
           using (var response = (HttpWebResponse)request.GetResponse())
           {
               using (var stream = response.GetResponseStream())
               {
                   document.Load(stream, Encoding.GetEncoding("iso-8859-9"));
               }
           }
           var node = document.DocumentNode.SelectSingleNode("//div[@class='phone-number']");
           {
               if (node != null)
                   TextBox1.Text = node.InnerHtml;
               else
                   TextBox1.Text = "na";
           }

       }


but it gives na in Textbox1 instead of required phone number.
kindly, help me to get correct result.
Note:- I have used Htmlagilitypack.
Thanks in advance.
Posted
Comments
Bernhard Hiller 8-May-14 2:24am    
Did you check that "document" really contains the expected html source?
Is the encoding correct (iso-8859-9 what's that?)?
hemal p.shah 8-May-14 4:09am    
Yes, i have checked the web page.it has the expected html source.Regarding encoding i don't have any idea what does exactly it mean. because i have taken it from other web site.if my code is wrong then please, give me idea of actual code to write, considering expected html source is already present in a web page mentioned.
Joshi, Rushikesh 8-May-14 17:34pm    
What URL are you trying, if it is public then give it to us so that someonce can check. I haved checked your code and it is working absolutely fine.

Thanks
Rushi
thursunamy 15-May-14 11:47am    
Check HtmlAgilityPack...

1 solution

Hello,

Simple solution that people don't really tell you.

Here is the code to do a web request:
C#
public static class WebConnection
    {
        public static string GetResponce(string url)
        {
            string data = "";
            try
            {
                WebRequest request = WebRequest.Create(url);
                request.Proxy = null;
                request.Credentials = CredentialCache.DefaultCredentials;

                WebResponse response = request.GetResponse();
                Console.WriteLine(((HttpWebResponse)response).StatusDescription);
                Stream dataStream = response.GetResponseStream();
                StreamReader reader = new StreamReader(dataStream);

                data = reader.ReadToEnd();

                reader.Close();
                response.Close();
            }
            catch
            { }

            return data;
        }
}


And it works like a charm.

Then what you do is call it:

C#
string responce = WebConnection.GetResponce("myuri");
//tip here
int i = 0;//This is a breakpoint trick


So create a breakpoint on the int i = 0; and run the app.

The app will break, hover over the variable responce and click the magnifying glass, copy any text into a notepad text document.

Now using string manipulation, manipulate the requested string into the required string.

Good luck!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900