Click here to Skip to main content
15,886,026 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have some problems, I must screen scrape a web page that is an aspx-site where the interesting data is the text in a grid view. I have no problem to get the text from the first page in the grid view.

But how do I get the text from the second page, the third page etc.

The problem is that every page in the grid view has not a specific URL, instead it is a java script that is executing when I click on the paging link button.

For example the java script for a paging java script is:
javascript:__doPostBack('ctl00$cphmaincontent$lbntNavigate2,3..up last page','')

How do I simulate this?
Posted
Updated 26-Jan-14 19:59pm
v2
Comments
Sergey Alexandrovich Kryukov 27-Jan-14 2:11am    
Not everything could be scraped; for certain pages, the whole problem cannot even be formulated...
—SA

Please see my comment to the question… However, if this is paging which behaves in some regular way, you can set a HTTP spy and see what HTTP requests are sent on each paging event. Apparently, you cannot assume any particular server-side technology for the site being scrapped. So, you need to learn how it works and mimic the behavior of the client site.

One method I used is some HTTP spy application. I, for example, use the one created as a plug-in to SeaMonkey/Firefox, its name is Http Fox, but I know for sure that there are similar tools, for different browsers and stand-along one. Using such tool, you can pretty easy find out what's going on. Besides, all source Javascript code is always readable to you, you can study it.

I want to emphasize that nothing can guarantee 100% success for all sites. However, you will probably find out very general most typically used classes of cases. For example, most ASP.NET pages with grid view use pretty much the same paging mechanism.

—SA
 
Share this answer
 
If you are using a WebBrowserControl to load the pages so you can scrape them, then you can try invoking a click event on the paging button, like this:
C#
webbrowser.Document.GetElementById("PagingButtonID").InvokeMember("click");
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900