Click here to Skip to main content
15,886,724 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Hi to all
I am trying to scrap this page:
http://www.webhostdir.com/search/profile.aspx?spid=19137[^]

Using code something like this
VB
Dim myRequest As HttpWebRequest = DirectCast(WebRequest.Create("http://www.webhostdir.com/search/profile.aspx?spid=19137"), HttpWebRequest)
        myRequest.Method = "GET"
        myRequest.KeepAlive = False
        Dim webresponse As HttpWebResponse
        Try
            webresponse = DirectCast(myRequest.GetResponse(), HttpWebResponse)
            Dim enc As Encoding = System.Text.Encoding.GetEncoding(1252)
            Dim loResponseStream As New StreamReader(webresponse.GetResponseStream(), enc)
            Dim r As String = loResponseStream.ReadToEnd()
            My.Computer.FileSystem.WriteAllText("C:\final.txt", r, True)
            loResponseStream.Close()
            webresponse.Close()
        Catch
        End Try


But this is not working, when i manually download page it shows me 54Kb size and by method above when i rip it it only shows 14Kb file.

Need help.

Thanks

this is the online service which is grabbing according to my needs. could some one help me with the logic of their ripping
http://www.ex-designz.net/htmlviewer.asp
Posted
Updated 8-Dec-22 9:56am
v4
Comments
TweakBird 5-Feb-11 7:54am    
Edited for formatting.
Sandeep Mewara 5-Feb-11 7:58am    
I couldn't get what is your issue?
Archit9373284448 5-Feb-11 8:01am    
issue is i am missing 40Kb of data.
Sandeep Mewara 5-Feb-11 8:02am    
Are you sure you are missing them ? Can't it be that all the data is there but just compressed?
Archit9373284448 5-Feb-11 8:04am    
no i am sure of that,
if u seen the page i have attached...i am interested in the middle page of the data only. and that is the one i am missing too..

What I found is not a complete answer yet, but it might help you to sort this out.

I tried the same using my own HTTP downloader and got exactly the same results.
But I also compared saved files and saw one big difference: there are hidden input elements with the name __VIEWSTATE:

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="... I skipped the content here ... " />


I did not show the content of the attribute value — it's pretty long.
So, here is the difference: at least in one case this value is much longer if you use a Web browser. The application uses hidden elements to save the view state, which is the known method.

I don't know yet how requests are different though. Maybe you can figure this out. It's possible to spy on HTTP to get what the Web browser sends, verbosely.

—SA
 
Share this answer
 
Comments
Espen Harlinn 6-Feb-11 6:23am    
Setting OP on the right path ..., my 5
Sergey Alexandrovich Kryukov 6-Feb-11 12:34pm    
Thank you very much, Espen.
Sorry I don't know further detail yet.
Maybe the utility you advised can help?
--SA
Sandeep Mewara 6-Feb-11 11:10am    
Righta! 5!
Sergey Alexandrovich Kryukov 6-Feb-11 12:35pm    
Thanka!
Sandeep Mewara 6-Feb-11 12:38pm    
:) :)
Here is a utility wget[^] - that performs the required operations. You can execute it from your code using the Process[^] class. While wget is open source, it's not written in c#.

It's an easy solution to your problem, it will allow you to get just about anything available on the site.

Regards
Espen Harlinn
 
Share this answer
 
Comments
Sandeep Mewara 6-Feb-11 11:10am    
Nice utility, good to know! 5! :)
Espen Harlinn 6-Feb-11 11:21am    
Thanks Sandeep Mewara!
Sergey Alexandrovich Kryukov 6-Feb-11 12:32pm    
Great find, Espen, very useful. My 5.
--SA
Espen Harlinn 6-Feb-11 12:53pm    
Thanks SAKryukov!
Archit9373284448 22-Feb-11 2:37am    
LOL that was my 2nd option...
Have a look at this article.

http://www.4guysfromrolla.com/articles/122204-1.aspx#postadlink[^]

I am not sure if WebClient class will help you in this scenario.If you have not tried that,take a look at this too.

http://www.4guysfromrolla.com/webtech/070601-1.shtml[^]
 
Share this answer
 
v3
Comments
Archit9373284448 5-Feb-11 11:13am    
Sorry This is not working... thanks for your efforts though
It may or may not affect you.
See below discussion :
Saving page source using webrequest[^]

The recommendation is using the webbrowser control if you want the browser version of the page.

Cheers
 
Share this answer
 
You could try this, you would have to convert to VB as this is in C# but should be fairly easy to convert.

C#
public static string GetWebSource(string site)
        {
            WebRequest request = WebRequest.Create(site);
            using (WebResponse response = request.GetResponse())
            {
                using (Stream responseStream = response.GetResponseStream())
                {
                    byte[] bytes = null;
                    using (MemoryStream ms = new MemoryStream())
                    {
                        responseStream.CopyTo(ms);
                        bytes = ms.ToArray();
                    }
                    return Encoding.ASCII.GetString(bytes);
                }
            }
        }
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900