Click here to Skip to main content
15,881,559 members
Articles / Programming Languages / C#

Downloading Multiple Files over HTTP Connection

Rate me:
Please Sign up or sign in to vote.
3.75/5 (7 votes)
25 Mar 2009CPOL3 min read 117.9K   5.5K   46   12
An application that can download the files, as listed on an HTML page, over an HTTP connection.

Introduction

My goal is to download multiple files from a directory listed using HTML (see the directory index example in the figure below) over an HTTP connection. I am not able to find an application that can perform both:

  1. Parse the HTML page for all the files of interest, and 
  2. Download the files via HTTP

So I started to write this C# application.

FileIndexHtml.JPG

Reference

I found a lot of helpful tutorials on-line that helped me with the implementation for this application. For more information, refer to articles "Fetching Web Pages with HTTP" by Joe Mayo, and "Creating a download manager in C#" by Andrew Pociu.

How It Works

The download demo application download files form the target host to a local directory using the HTTP protocol. It is a very simple application that has one Window form as shown in the following screen shot.

DownloadMgmr001.JPG

The two main methods that provide two functionalities as described in the background section are:

  • GetHTTPContent – Method to get (download) the HTML page content
  • ParseFileNamesFromWebPage – Method that parses the HTML page and returns a list of filenames
  • DownloadFile – Method that downloads a single file over HTTP

For the download demo application, my goal is to download files over an HTTP connection. The application uses the System.Net.WebRequest and System.Net.HttpWebRequest classes to interact with the HTTP server, such as request for data and retrieve response from the server. Note that the WebRequest class supports a variety of uniform resource identifier (URI) requests, including HTTP, HTTPS, and file scheme identifiers. The user may be able to modify the download demo code for the other scheme identifiers.

The GetHTTPContent method uses the WebRequest.Create method to request access to the URI, in this example the URI is a Web page on my internal server (i.e. MyWebServer), example below:

C#
HttpWebRequest request = 
	(HttpWebRequest)WebRequest.Create("http://MyWebServer:8080/MyFolder/");

The response for the WebRequest is casted to an HttpWebRequest reference, thus the communication will be over the HTTP protocol. The method then captures the data response to the Internet request via WebRequest.GetResponse. The response is read in as a data stream into a data buffer and writes to a string object. Below is the completed code:

C#
// used on each read operation
byte[] buffer = new byte[8192];
string tempString = null;
int    count = 0;

// Create the WebRequest Instance
HttpWebRequest request = 
	(HttpWebRequest)WebRequest.Create("http://MyWebServer:8080/MyFolder/");
// Query for the response
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Response captured in data stream
Stream responseStream = response.GetResponseStream();

do
{
    // Read the response stream
    count = responseStream.Read(buffer, 0, buffer.Length);

    if (count != 0)
    {
        // Convert from bytes to ASCII text
        tempString = Encoding.ASCII.GetString(buffer, 0, count);
        webPageString.Append(tempString);
    }
}
while (count > 0);

The ParseFileNamesFromWebPage method extracts all the filenames form the web page using a known token that is: NAME="filename.extension". Below is the code demonstration:

C#
// Get the index of the first found token.
int tokenIndex = webPageContent.IndexOf(knownToken);
webPageContent = webPageContent.Remove(0, tokenIndex + knownToken.Length);

// Parse the file to get all the file names from the file.
while (webPageContent.Length > 0 && tokenIndex > 0)
{
    String fileName = webPageContent.Substring(0, webPageContent.IndexOf("\""));
    fileNames.Add(fileName);

    // Find the next token.
    tokenIndex = webPageContent.IndexOf(knownToken);
    webPageContent = webPageContent.Remove(0, tokenIndex + knownToken.Length);
}

Finally, download all the files via DownloadFile method by using the file names retrieved from ParseFileNamesFromWebPage method. The DownloadFile method is similar to GetHTTPContent method, but this time the code uses both System.Net.WebClient and System.Net.WebRequest classes together.

The WebClient class is an encapsulated class and easier to use, but the WebRequest class provides additional information I require to perform the download, and update the download progress control. In this download demo application, I have used the WebRequest to get the size of the file and a WebClient to download the file using a stream. The user may decide to use the WebRequest.GetResponseStream method instead, and either way would work.

C#
// Open the URL for download 
WebClient wcDownload = new WebClient() 
streamResponse = wcDownload.OpenRead(downloadFileName);

// Loop through the buffer until the buffer is empty
while ((bytesSize = streamResponse.Read(
    downBuffer, 0, downBuffer.Length)) > 0)
{
    // Write the data from the buffer to the local hard drive
    fileStream.Write(downBuffer, 0, bytesSize);
    totalSize += bytesSize;
}

if (streamResponse != null)
{
    // When the above code has ended, close the streams
    streamResponse.Close();
}

Note that this download demo application bundles the entire download step into a separate thread to avoid the user interface being frozen while the download is in progress. Thread-safe calls to Windows forms controls must be used to ensure the download thread can safely access and update the Windows control for status update. This topic is out of the scope of this article but details can be found within the download demo application package.

The download demo application gives users the capability to exclude files from download. The application currently has the limitation of only accepting one exclusion string and one inclusion string, as shown in the following screen shots:

DownloadMgmr002.JPG

DownloadMgmr003.JPG

Note

This download demo application can only download the files within a directory. If you have more than one directory that needs to be downloaded, you need to modify the source code to download files from sub-directories. You can also run multiple instances of this application to get files from sub-directories as well.

History

  • March 2009 - First release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Systems Engineer
Canada Canada
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionThe remote server returned an error: (401) Unauthorized. Pin
bunty swapnil30-Jun-14 22:50
bunty swapnil30-Jun-14 22:50 
QuestionI was looking for an .exe file...the .zip has the source code only, right? Pin
Member 833654020-Oct-11 10:07
Member 833654020-Oct-11 10:07 
QuestionvbScript Pin
Member 81741722-Sep-11 9:13
Member 81741722-Sep-11 9:13 
QuestionSite Authentication Pin
Member 817417219-Aug-11 11:37
Member 817417219-Aug-11 11:37 
AnswerRe: Site Authentication Pin
A. Kwan21-Aug-11 14:10
A. Kwan21-Aug-11 14:10 
QuestionIF U WANT TO DOWNLOAD MULTIPLE FILES FROM A FOLDER FROM UR HTTP SERVER Pin
kaushik240213-Jul-11 1:00
kaushik240213-Jul-11 1:00 
GeneralRe: IF U WANT TO DOWNLOAD MULTIPLE FILES FROM A FOLDER FROM UR HTTP SERVER Pin
A. Kwan21-Aug-11 14:08
A. Kwan21-Aug-11 14:08 
Generalplease update this tool Pin
mp4city21-Mar-11 11:17
mp4city21-Mar-11 11:17 
GeneralRe: please update this tool Pin
A. Kwan21-Mar-11 11:29
A. Kwan21-Mar-11 11:29 
General[My vote of 2] Response Pin
edyblenko3-Jun-10 4:55
edyblenko3-Jun-10 4:55 
Nice concept. Thanks a lot for the ideas. Yet didn't work on my computer. No error messages. Looks raw and should be honed further.
GeneralMy vote of 1 Pin
asilvagomez31-Dec-09 7:37
asilvagomez31-Dec-09 7:37 
GeneralRe: My vote of 1 Pin
Mabyre229-Oct-15 1:54
Mabyre229-Oct-15 1:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.