Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C#
I need some help, finding a C# .Net Solution for scraping an Ajax website.
Anyone ??
Posted 27-Nov-12 7:40am
Comments
Plyswthsqurles at 27-Nov-12 15:30pm
   
What exactly are you trying to grab from these websites? You have a number of options.
 
1) XPath to load an HTML document, can be tricky with malformed HTML
2) Selenium (Browser automation but has .net capbabilities)
3) Html Agility pack to load a website, it also handles malformed html
 
A non-c# solution, still browser automation related is watir...its ruby.
Paw Jershauge at 27-Nov-12 15:35pm
   
Well im not the big website building anymore, i stopped at asp classic ;) im more in winforms. So lets see if i can explain myself, here goes:
I have a website that posts status on some systems. The status message and assosicated information are posted back via ajax, and therefor the normal HTMLElements wont hold the correct text in the innerText property. hope that makes sence ;)
ryanb31 at 27-Nov-12 15:55pm
   
AJAX can easily return strings. What exactly is coming back from the AJAX call that can't go into the html elements? Something doesn't seem right here.
Paw Jershauge at 27-Nov-12 15:58pm
   
ryanb31 its not that that ajax cant return the data, it does. and i can view the message in my browser, but when i look into the Html source code of the site, the message is not there, its only a {{message}} variable or somethinf thats in the place where the message text should be.

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Well i belive i found a workaround solution for this issue.
I just use the WebBrowser instead of the WebClient and have the WebBrowser render the hole site before extracting the HtmlDocument. takes time, but it works.
 
heres the code
        public HtmlDocument GetHtmlAjax(Uri uri, int AjaxTimeLoadTimeOut)
        {
            using (WebBrowser wb = new WebBrowser())
            {
                wb.Navigate(uri);
                while (wb.ReadyState != WebBrowserReadyState.Complete)
                    Application.DoEvents();
                Thread.Sleep(AjaxTimeLoadTimeOut);
                Application.DoEvents();
                return wb.Document;
            }
        }
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 668
1 OriginalGriff 396
2 Tadit Dash 345
3 sanket saxena 329
4 Peter Leow 193
0 Sergey Alexandrovich Kryukov 12,109
1 OriginalGriff 7,326
2 Peter Leow 5,003
3 Abhinav S 4,003
4 Maciej Los 3,575


Advertise | Privacy | Mobile
Web04 | 2.8.140421.2 | Last Updated 27 Nov 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Use
Layout: fixed | fluid