Click here to Skip to main content
15,889,403 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to scrape some pages using Html Agility Pack in C#, but server is returning Service Unavailable sometimes. Can anyone help to know me the best way for scrapping. Thanks.
Posted

See this article:
Html Agility Pack - Massive information extraction from WWW pages[^]

Quote:
but server is returning Service Unavailable sometimes

It's unlikely that this has something to do with HtmlAgilityPack; probably there's just a problem at the server.
 
Share this answer
 
Comments
anoopgoyal 9-Apr-15 2:42am    
If I use Amazon server URLs to scrape the pages, it returns status as ServiceUnavilable. I checked the status HtmlWeb.StatusCode. Sometime i able to scrape but most often it is rejected. Is there any other properties by which i can spoof that.
Thomas Daniels 9-Apr-15 11:21am    
Are you sure you use the correct URL to fetch the pages?
anoopgoyal 10-Apr-15 1:07am    
Yes, really that is correct URL :)
Thomas Daniels 10-Apr-15 1:49am    
Can you provide your code please?
Hi, I have got it solved myself, used a bit different code for that.

WebClient client = new WebClient();
string downloadString = client.DownloadString("http://www.example.com/default.aspx");

HtmlAgilityPack.HtmlDocument htmlPage = new HtmlAgilityPack.HtmlDocument();
htmlPage.LoadHtml(downloadString);

and it works without any issue.
cheers!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900