Click here to Skip to main content
Email Password   helpLost your password?

Introduction

In this article, I will attempt to describe the steps required to efficiently download various files from a Web server. In addition, I am assuming that you're somewhat familiar with C# general structure as well as the HTTP protocol, especially the HTTP header.

Let's Get Started

So there are a couple of steps we need to take in order to download a file from a Website. From an abstract point of view when you're talking to an HTTP server, you're working in one of two modes: You're either sending Request(s) or you're receiving Response(s).

.NET References

First of all, you need to remember to reference System.Net to be able to use .NET's WebRequest and WebResponse classes.

using System.Net; 

The next thing we'll look at is the HttpUserAgent. This tells the destination server who we are. You usually want to use this if you're crawling a Website. Some sites look at this value and load/unload certain features.

Cookies

We need to look at the Cookiecontainer object. We use this so that we don't bombard the site given multiple downloads. Basically once we connect, the Web server checks to see if we have a cookie for the site. If one exists, then it asks for it and uses it, otherwise, we create a new one.

There are a number of items that we need to initialize before we establish a connection. The first item is the HttpWebRequest. We initialize this variable while passing it the URL that we're connecting to. This step can be done later as well.

httpRequest = (HttpWebRequest)WebRequest.Create(siteURL);   

The next variable is the status of the cookie. We do this by checking the value Static boolean variable. If it's set then we know that we already have a cookie, otherwise we create one.

if (Downloader.IsFirstConnection)
{
    httpCookie = new CookieContainer();
    Downloader.IsFirstConnection = false;
}  

Similarly, we initialize UserAgent and set other settings such as AutoRedirect. Once everything is done, we're ready to connect to the Web server. That's done by:

httpResponse = (HttpWebResponse)httpRequest.GetResponse(); 

Upon connection, we can check the code returned from the Web server and deal with any kind of errors if any. Upon return code 200, we can go ahead and read the HTTP header as well as the body of the response. I have intentionally left these two sections blank since you can parse and format the data as it is downloaded.

Lastly, we need to close the connection. We put this in the finally section of the code so that even if there is an error, we still close the connection gracefully. Below is the sample code of the above put together.

namespace SimpleDownloader
{
    class Downloader
    {
        public const string HttpUserAgent = "Sean's Agent/1.0 " + 
        "(compatible; SA 1.0; Windows NT 6.0; SLCC1;" +
        " .NET CLR 2.0.50727; .NET CLR 3.0.04506; .NET CLR 1.1.4322;";
        CookieContainer httpCookie;
 
        public byte[] ConnectAndDownloadURL(string siteURL)
        {
            HttpWebRequest httpRequest = null;
            HttpWebResponse httpResponse = null;
            byte[] httpHeaderData = null;
            byte[] httpData = null;

            httpRequest = (HttpWebRequest)WebRequest.Create(siteURL);

            //we check to see if it's the first time 
            //we're connecting so we can save the cookie
            //otherwise we use the existing cookie
            if (Downloader.IsFirstConnection)
            {
                httpCookie = new CookieContainer();
                Downloader.IsFirstConnection = false;
            }
 
            httpRequest.CookieContainer = httpCookie;
            httpRequest.AllowAutoRedirect = true;
            httpRequest.UserAgent = Downloader.HttpUserAgent;

            try
            {
                httpResponse = (HttpWebResponse)httpRequest.GetResponse();
                if (httpResponse.StatusCode == HttpStatusCode.OK)
                {
                    httpCookie = httpRequest.CookieContainer;
                    httpHeaderData = httpResponse.Headers.ToByteArray();
                    Stream httpContentData = httpResponse.GetResponseStream();
                    using (httpContentData)
                    {
                        // Now you can do what ever you want with the data here.
                        // i.e. convert it, parse it etc. You can write stuff to httpData
                    }
                    return httpData;
                }
                else
                {
                    //Report error 
                    return null;
                }
            }
            catch (WebException we)
            {
                //Report error
            }
            finally
            {
                if (httpResponse != null)
                {
                    httpResponse.Close();
                }
 
            }
        }
    }
}

Please note that the above is only meant to give you a general guideline and a starting step to communicate with a webserver. You can then tweak the settings and variables so that it meets the needs of your particular application.

Happy coding!

History

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralHow deal with cookie?
lixingyi
17:27 10 Mar '09  
Redirect other url,how to deal with cookie;
for example:
first url:http://rt.aa.com/tracker?action=click&tu=subscription.do
redirect url :http://my.bb.com/subscription.do?dispatch=start
QuestionIsFirstConnection?
MaxGuernsey
12:16 7 May '08  
Where is the implementation of IsFirstConnection?

Max Guernsey, III
Managing Member, Hexagon Software LLC
http://www.hexsw.com
http://www.dataconstructor.com

AnswerRe: IsFirstConnection?
Sean Dastouri
8:43 8 May '08  
IsFirstConnection can be a simple static bool value that is defined in the body of the class. Here is its definition:


private bool isFirstConnection;

public bool IsFirstConnection{
get{ return isFirstConnection; }
set{ isFirstConnection = value; }
}

GeneralRe: IsFirstConnection?
MaxGuernsey
10:38 8 May '08  
Why static? The code snippet below is not static.

Max Guernsey, III
Managing Member, Hexagon Software LLC
http://www.hexsw.com
http://www.dataconstructor.com

AnswerRe: IsFirstConnection?
Sean Dastouri
7:14 9 May '08  
You dont have to have it static. It can be a member variable. But if you need to check the status of the connection from other functions that maybe also be defined as static, it would make sense to make the variable also static. But if you're only using this function/class, then you can simply declare the variable as a regular member.

This is the reason why I didn't define this variable because it depends on your program and how you will be using this variable.
GeneralRe: IsFirstConnection?
MaxGuernsey
7:27 9 May '08  
If it is static and httpCookie is not, it seems like you would put yourself in a situation where httpCookie could be unintentionally null. The whole reason I started this thread was that you seemed to be implying that the field should be static by using the name of the type to address it and I don't think that makes sense. Even if httpCookie was a static variable, that would rob you of a lot of the value of having an object - different clients could not isolate their behaviors. Client A would have Client B's cookies and visa versa. I would recommend getting rid of the static implication. Otherwise it's good.

Max Guernsey, III
Managing Member, Hexagon Software LLC
http://www.hexsw.com
http://www.dataconstructor.com

GeneralCool!
ciricivan
9:35 6 May '08  
Plain, simple , good Big Grin
GeneralGreat intro
Darchangel
8:54 6 May '08  
I'd like to have seen this go into more detail but it is a great intro (which I know was the point).

I hate that MS didn't make this process more intuitive. Especially: I think there's no good reason why we should have to have multiple casts just to perform many basic and common web-based procedures.

Seeing it all laid out in a functional barebones app like this will save a lot of people much frustration. Good job.
QuestionReturn type
bubbleHead
6:21 2 May '08  
Good article... but is it possible to return other than byte[], like string?

Thanks
AnswerRe: Return type
Darchangel
8:46 6 May '08  
http://www.google.com/search?q=byte+array+to+string[^]
Generalcatch (WebException we)
Uwe Keim
19:49 1 May '08  
I prefer libraries letting myself control how to handle exceptions.

My personal 24/7 webcam - Always live Wink
Zeta Producer Desktop CMS - Intuitive, completely easy-to-use CMS for Windows.
Zeta Helpdesk - Open Source ticket software for Windows and web.
Zeta Uploader - Easily send large files by e-mail. Windows and web client.



Last Updated 1 May 2008 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010