View source of web page (.aspx pages)

Question

5.00/5 (1 vote)

See more:

Hi to all
I am trying to scrap this page:
http://www.webhostdir.com/search/profile.aspx?spid=19137[^]

Using code something like this

VB

Dim myRequest As HttpWebRequest = DirectCast(WebRequest.Create("http://www.webhostdir.com/search/profile.aspx?spid=19137"), HttpWebRequest)
        myRequest.Method = "GET"
        myRequest.KeepAlive = False
        Dim webresponse As HttpWebResponse
        Try
            webresponse = DirectCast(myRequest.GetResponse(), HttpWebResponse)
            Dim enc As Encoding = System.Text.Encoding.GetEncoding(1252)
            Dim loResponseStream As New StreamReader(webresponse.GetResponseStream(), enc)
            Dim r As String = loResponseStream.ReadToEnd()
            My.Computer.FileSystem.WriteAllText("C:\final.txt", r, True)
            loResponseStream.Close()
            webresponse.Close()
        Catch
        End Try

But this is not working, when i manually download page it shows me 54Kb size and by method above when i rip it it only shows 14Kb file.

Need help.

Thanks

this is the online service which is grabbing according to my needs. could some one help me with the logic of their ripping
http://www.ex-designz.net/htmlviewer.asp

Posted 5-Feb-11 1:50am

Archit9373284448

Updated 8-Dec-22 9:56am

v4

Add a Solution

Comments

TweakBird 5-Feb-11 7:54am

Edited for formatting.

Sandeep Mewara 5-Feb-11 7:58am

I couldn't get what is your issue?

Archit9373284448 5-Feb-11 8:01am

issue is i am missing 40Kb of data.

Sandeep Mewara 5-Feb-11 8:02am

Are you sure you are missing them ? Can't it be that all the data is there but just compressed?

Archit9373284448 5-Feb-11 8:04am

no i am sure of that,
if u seen the page i have attached...i am interested in the middle page of the data only. and that is the one i am missing too..

5 solutions

Solution 4

Here is a utility wget[^] - that performs the required operations. You can execute it from your code using the Process[^] class. While wget is open source, it's not written in c#.

It's an easy solution to your problem, it will allow you to get just about anything available on the site.

Regards
Espen Harlinn

Posted 6-Feb-11 0:31am

Espen Harlinn

Comments

Sandeep Mewara 6-Feb-11 11:10am

Nice utility, good to know! 5! :)

Espen Harlinn 6-Feb-11 11:21am

Thanks Sandeep Mewara!

Sergey Alexandrovich Kryukov 6-Feb-11 12:32pm

Great find, Espen, very useful. My 5.
--SA

Espen Harlinn 6-Feb-11 12:53pm

Thanks SAKryukov!

Archit9373284448 22-Feb-11 2:37am

LOL that was my 2nd option...

Archit9373284448 22-Feb-11 2:39am

AND NOW QUESTION IS HOW WGET DID THAT?

Espen Harlinn 22-Feb-11 12:16pm

I think you'll find the answer here:http://downloads.sourceforge.net/gnuwin32/wget-1.11.4-1-src-setup.exe

Solution 1

Have a look at this article.

http://www.4guysfromrolla.com/articles/122204-1.aspx#postadlink[^]

I am not sure if WebClient class will help you in this scenario.If you have not tried that,take a look at this too.

http://www.4guysfromrolla.com/webtech/070601-1.shtml[^]

Posted 5-Feb-11 2:09am

Anupama Roy

Updated 5-Feb-11 2:18am

v3

Comments

Archit9373284448 5-Feb-11 11:13am

Sorry This is not working... thanks for your efforts though

Solution 2

It may or may not affect you.
See below discussion :
Saving page source using webrequest[^]

The recommendation is using the webbrowser control if you want the browser version of the page.

Cheers

Posted 5-Feb-11 2:18am

Estys

Solution 8

You could try this, you would have to convert to VB as this is in C# but should be fairly easy to convert.

C#

public static string GetWebSource(string site)
        {
            WebRequest request = WebRequest.Create(site);
            using (WebResponse response = request.GetResponse())
            {
                using (Stream responseStream = response.GetResponseStream())
                {
                    byte[] bytes = null;
                    using (MemoryStream ms = new MemoryStream())
                    {
                        responseStream.CopyTo(ms);
                        bytes = ms.ToArray();
                    }
                    return Encoding.ASCII.GetString(bytes);
                }
            }
        }

Posted 8-Dec-22 9:56am

charles henington

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-02-05T11:06:00

What I found is not a complete answer yet, but it might help you to sort this out.

I tried the same using my own HTTP downloader and got exactly the same results.
But I also compared saved files and saw one big difference: there are hidden input elements with the name __VIEWSTATE:

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="... I skipped the content here ... " />

I did not show the content of the attribute value — it's pretty long.
So, here is the difference: at least in one case this value is much longer if you use a Web browser. The application uses hidden elements to save the view state, which is the known method.

I don't know yet how requests are different though. Maybe you can figure this out. It's possible to spy on HTTP to get what the Web browser sends, verbosely.

—SA

View source of web page (.aspx pages)

5 solutions

Solution 3

Solution 4

Solution 1

Solution 2

Solution 8

Add your solution here

Preview 0

Existing Members

...or Join us