Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / C#
Article

Multi-threaded file download manager

Rate me:
Please Sign up or sign in to vote.
4.79/5 (49 votes)
5 Jun 20064 min read 269.1K   15.7K   188   66
A fully working multi-threaded file downloader application.

Introduction

A few months ago, my lovely wife was downloading her lecture notes from her university website, and I noticed that she had to manually click on every file to save it to the hard disk.

I also noticed that all the hyperlinks were on the same page, and the documents she was downloading were either Word documents or PowerPoint presentations.

Right then, a little light bulb went on in my head, and I decided to build her a file download utility. The requirements were simple:

  • I should be able to point to a web page and filter URLs on it. For example, "*.doc" should give me a collection of URLs that have .doc at the end of the link.
  • From the list of available files, I should be able to select the files I want to download.
  • I should be able to download my selected files simultaneously.
  • I should be able to nominate the number of simultaneous threads for download.
  • I should be able to cancel a download at any point in time.
  • I should be informed of the download status of each selected file.
  • I do not want to re-download files that I have downloaded before.

Using the code

The code is divided into three major sections.

  • FileDownloader.cs - accepts a URL and starts downloading it.
  • WebPageInterrogater.cs - accepts a filter string and a URL, and returns a list of hyperlinks that match the filter.
  • Main.cs - contains the UI code.

Let's have a look at the WebPageInterrogater class.

The regex expression below will find all HREFs in a web page:

C#
const string _findAllHrefsPattern = "(?<HTML><a[^>]" + 
      "*href\\s*=\\s*[\\\"\\']?(?<HRef>[^\"'>\\s]*)" + 
      "[\\\"\\']?[^>]*>(?<Title>[^<]+|.*?)?</a\\s*>)";

The WebPageInterrogater class constructor takes in a string of filter expressions (for example, *.doc;*.ppt) and creates a regex expression from it.

C#
/// <summary>
/// Crawls the given url looking for hyperlinks
/// and extracts all hyperlinks that match the filter.
/// For example *.doc will return hyperlinks for word documents.
/// </summary>
/// <param name="url"></param>
/// <param name="sFilters"></param>
public WebPageInterrogater(string url, string sFilters)
{
    _url = url;
    string[] filters = sFilters.Split(';');
    string pattern = string.Empty;
    for (int i = 0; i < filters.Length; i++ )
    {
        pattern = "\\" + filters[i].Replace("*", 
                  string.Empty) + "$" + "|";
        _filters += pattern;
    }
    _filters = _filters.Substring(0, _filters.Length-1);
}

ListFiles() will search for the requested patterns in the target web page and return a collection of matching URLs.

C#
/// <summary>
/// Returns a collection of documents that are eligible to download.
/// </summary>
/// <returns>>/returns>
public StringCollection ListFiles()
{
    StringCollection sCol = new StringCollection();
    string webPage = GetWebPage();

    string ahref = string.Empty;
    string title = string.Empty;
    string value = string.Empty;
    string fileName = string.Empty;


    Regex regEx = new Regex(_findAllHrefsPattern, 
                  RegexOptions.Compiled | RegexOptions.IgnoreCase);
    Regex regEx2 = new Regex(_filters, 
                   RegexOptions.Compiled | RegexOptions.IgnoreCase);

    MatchCollection matches = regEx.Matches(webPage);
    foreach (Match match in matches)
    {
        int iCount = match.Groups.Count;
        ahref = match.Groups[0].Value;
        value = match.Groups[2].Value;
        title = match.Groups[3].Value;

        if (regEx2.IsMatch(value))
        {
            sCol.Add(TopLevelUrl + "/" + value);
        }
    }
    return sCol;
}

Now, we have a collection of URLs that we want to download. All we have to do is create an instance of the WebPageInterrogater class and pass in the requested URL and save location.

C#
public FileDownloader(string documentUrl, string directory)
{
    _DocumentUrl = documentUrl;
    _DirectoryPath = directory;
}

Once the class is initialized, the download can begin. This method can be called asynchronously. It will raise the DownloadStarting and DownloadCompleted events when the file starts and stops downloading, respectively.

C#
/// <summary>
/// Starts the download of the attached url into the given directory.
/// </summary>
public void StartDownload()
{
    if (_DocumentUrl.Equals(string.Empty))
    {
        throw new ArgumentException("Please supply a document url.");
    }
    if (_DirectoryPath.Equals(string.Empty))
    {
        throw new ArgumentException("Please supply a directory.");
    }
    _IsStarted = true;
    /* raise the download starting event. */
    DownloadStarting(this);
    _IsDownloading = true;
    _IsDownloadSuccessful = false;
    Stream stream = null;
    FileStream fstream = null;

    try
    {
        string destFileName = _DirectoryPath + "\\" + FileName;
        destFileName = destFileName.Replace("/", 
                          " ").Replace("%20", " ");

        if (File.Exists(destFileName) == false)
        {
            HttpWebRequest request = 
               (HttpWebRequest)WebRequest.Create(_DocumentUrl);
            HttpWebResponse response = 
               (HttpWebResponse)request.GetResponse();
            stream = response.GetResponseStream();

            byte[] inBuffer = ReadFully(stream, 32768);

            fstream = new FileStream(destFileName, 
                      FileMode.OpenOrCreate, FileAccess.Write);
            fstream.Write(inBuffer, 0, inBuffer.Length - 1);


            fstream.Close();
            stream.Close();
        }
        _IsDownloadSuccessful = true;
        _IsDownloading = false;
        /* raise a download completed event. */
        DownloadCompleted(this, _IsDownloadSuccessful);
    }
    catch
    {
        _IsDownloadSuccessful = false;
    }
    finally
    {
        if (fstream != null)
        {
            fstream.Close();
        }
        if (stream != null)
        {
            stream.Close();
        }
    }
}

Let's see how this all hangs together from the UI perspective. The Get Files button takes in the given URL and retrieves all files that match the Target Filter.

The available files are displayed in the listbox below. The user will then select the files that he/she wants to download and click the Download button. The files start downloading by spawning a new download thread until the maximum number of threads is used up. Any pending files are queued up until a thread finishes downloading and is available again.

Image 1

The overall download progress is displayed in the progress bar, and each file's download status is displayed in the status column next to it. The entire download operation can be cancelled by clicking on the Cancel All button.

Image 2

One last point. You will notice that the URL textbox "remembers" your last URL between application restarts.

This feature is dependent upon a persistence library that I wrote in this article. If you need persistence for any other textbox, then just put the text "persist" in the Tag property.

Points of Interest

I learnt a lot about multi-threaded UI programming in this article. It is not easy updating the main UI thread from multiple executing threads if you do not have planned multi-threaded access. This was especially painful when programming the progress bar, and appropriate code locks in the right places in code helped a great deal. I did not want to end up in a situation where unnecessary locks would slow down my code, so I erred on the side of caution, and improved the code incrementally until the UI was behaving consistently.

Revisions

Thank you everyone for your suggestions. I have taken your advice onboard, and have released a new version with the following additions/fixes:

  • Fixed the missing byte problem (tested).
  • Added proxy server authentication (un-tested). It would be great if readers with a proxy server could test that out please and post comments in the discussion section.
  • todo - allow FTP downloads (next version).

History

  • 21/May/2006 - Initial version.
  • 04/June/2006 - Revision 1.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Australia Australia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralMy vote of 2 Pin
Member 114377052-Apr-15 6:57
Member 114377052-Apr-15 6:57 
BugVery buggy Pin
Member 114377052-Apr-15 6:53
Member 114377052-Apr-15 6:53 
QuestionDownloader Pin
narendra.dwfs20-Mar-12 8:09
narendra.dwfs20-Mar-12 8:09 
Generalyou are funny man ... Pin
TheDevelopper23-Sep-10 22:43
TheDevelopper23-Sep-10 22:43 
GeneralNamespace errors building the latest source code Pin
David Marcionek25-Sep-09 6:03
David Marcionek25-Sep-09 6:03 
Generalexecution of download manager Pin
sureshkg6-Aug-09 21:16
sureshkg6-Aug-09 21:16 
GeneralLatest Code Pin
ShailenSukul27-Feb-09 0:29
ShailenSukul27-Feb-09 0:29 
Generalnot found http://www.codeplex.com/bloodhound Pin
kiquenet.com25-Feb-09 1:43
professionalkiquenet.com25-Feb-09 1:43 
GeneralRe: not found http://www.codeplex.com/bloodhound Pin
ShailenSukul27-Feb-09 0:31
ShailenSukul27-Feb-09 0:31 
GeneralRe: not found http://www.codeplex.com/bloodhound Pin
kiquenet.com2-Mar-09 21:36
professionalkiquenet.com2-Mar-09 21:36 
GeneralRe: not found http://www.codeplex.com/bloodhound Pin
ShailenSukul3-Mar-09 15:35
ShailenSukul3-Mar-09 15:35 
Generalhelp..i want to download Pin
Member 463768023-Feb-09 23:26
Member 463768023-Feb-09 23:26 
I am really want to play with the latest code..I try to download the code from codeplex but it I cannot enter the pages because the project is not yet published..can you please let me to have the code
GeneralRe: help..i want to download Pin
kiquenet.com25-Feb-09 1:45
professionalkiquenet.com25-Feb-09 1:45 
GeneralRe: help..i want to download Pin
ShailenSukul27-Feb-09 0:31
ShailenSukul27-Feb-09 0:31 
GeneralBUG fstream.Write Pin
Thanks for all the fish2-Oct-07 6:36
Thanks for all the fish2-Oct-07 6:36 
GeneralRe: BUG fstream.Write Pin
ShailenSukul27-Nov-07 12:03
ShailenSukul27-Nov-07 12:03 
GeneralPersistenceHelper Problems Pin
BitLord Developer Network26-Aug-07 7:20
BitLord Developer Network26-Aug-07 7:20 
GeneralRe: PersistenceHelper Problems Pin
ShailenSukul26-Aug-07 18:17
ShailenSukul26-Aug-07 18:17 
GeneralRe: PersistenceHelper Problems Pin
f123d11-Sep-07 20:25
f123d11-Sep-07 20:25 
GeneralRe: PersistenceHelper Problems Pin
Michael Sync9-Oct-07 6:10
Michael Sync9-Oct-07 6:10 
QuestionError in Opening Source Project Pin
Ranjan.D13-Aug-07 0:51
professionalRanjan.D13-Aug-07 0:51 
AnswerRe: Error in Opening Source Project Pin
ShailenSukul26-Aug-07 18:18
ShailenSukul26-Aug-07 18:18 
Questionsave the source code Pin
menakerman21-Mar-07 3:24
menakerman21-Mar-07 3:24 
AnswerRe: save the source code Pin
bhlola2-Aug-07 0:08
bhlola2-Aug-07 0:08 
GeneralRe: save the source code Pin
ShailenSukul26-Aug-07 18:19
ShailenSukul26-Aug-07 18:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.