Click here to Skip to main content
Click here to Skip to main content

Multi-threaded file download manager

, 5 Jun 2006
Rate this:
Please Sign up or sign in to vote.
A fully working multi-threaded file downloader application.

Introduction

A few months ago, my lovely wife was downloading her lecture notes from her university website, and I noticed that she had to manually click on every file to save it to the hard disk.

I also noticed that all the hyperlinks were on the same page, and the documents she was downloading were either Word documents or PowerPoint presentations.

Right then, a little light bulb went on in my head, and I decided to build her a file download utility. The requirements were simple:

  • I should be able to point to a web page and filter URLs on it. For example, "*.doc" should give me a collection of URLs that have .doc at the end of the link.
  • From the list of available files, I should be able to select the files I want to download.
  • I should be able to download my selected files simultaneously.
  • I should be able to nominate the number of simultaneous threads for download.
  • I should be able to cancel a download at any point in time.
  • I should be informed of the download status of each selected file.
  • I do not want to re-download files that I have downloaded before.

Using the code

The code is divided into three major sections.

  • FileDownloader.cs - accepts a URL and starts downloading it.
  • WebPageInterrogater.cs - accepts a filter string and a URL, and returns a list of hyperlinks that match the filter.
  • Main.cs - contains the UI code.

Let's have a look at the WebPageInterrogater class.

The regex expression below will find all HREFs in a web page:

const string _findAllHrefsPattern = "(?<HTML><a[^>]" + 
      "*href\\s*=\\s*[\\\"\\']?(?<HRef>[^\"'>\\s]*)" + 
      "[\\\"\\']?[^>]*>(?<Title>[^<]+|.*?)?</a\\s*>)";

The WebPageInterrogater class constructor takes in a string of filter expressions (for example, *.doc;*.ppt) and creates a regex expression from it.

/// <summary>
/// Crawls the given url looking for hyperlinks
/// and extracts all hyperlinks that match the filter.
/// For example *.doc will return hyperlinks for word documents.
/// </summary>
/// <param name="url"></param>
/// <param name="sFilters"></param>
public WebPageInterrogater(string url, string sFilters)
{
    _url = url;
    string[] filters = sFilters.Split(';');
    string pattern = string.Empty;
    for (int i = 0; i < filters.Length; i++ )
    {
        pattern = "\\" + filters[i].Replace("*", 
                  string.Empty) + "$" + "|";
        _filters += pattern;
    }
    _filters = _filters.Substring(0, _filters.Length-1);
}

ListFiles() will search for the requested patterns in the target web page and return a collection of matching URLs.

/// <summary>
/// Returns a collection of documents that are eligible to download.
/// </summary>
/// <returns>>/returns>
public StringCollection ListFiles()
{
    StringCollection sCol = new StringCollection();
    string webPage = GetWebPage();

    string ahref = string.Empty;
    string title = string.Empty;
    string value = string.Empty;
    string fileName = string.Empty;


    Regex regEx = new Regex(_findAllHrefsPattern, 
                  RegexOptions.Compiled | RegexOptions.IgnoreCase);
    Regex regEx2 = new Regex(_filters, 
                   RegexOptions.Compiled | RegexOptions.IgnoreCase);

    MatchCollection matches = regEx.Matches(webPage);
    foreach (Match match in matches)
    {
        int iCount = match.Groups.Count;
        ahref = match.Groups[0].Value;
        value = match.Groups[2].Value;
        title = match.Groups[3].Value;

        if (regEx2.IsMatch(value))
        {
            sCol.Add(TopLevelUrl + "/" + value);
        }
    }
    return sCol;
}

Now, we have a collection of URLs that we want to download. All we have to do is create an instance of the WebPageInterrogater class and pass in the requested URL and save location.

public FileDownloader(string documentUrl, string directory)
{
    _DocumentUrl = documentUrl;
    _DirectoryPath = directory;
}

Once the class is initialized, the download can begin. This method can be called asynchronously. It will raise the DownloadStarting and DownloadCompleted events when the file starts and stops downloading, respectively.

/// <summary>
/// Starts the download of the attached url into the given directory.
/// </summary>
public void StartDownload()
{
    if (_DocumentUrl.Equals(string.Empty))
    {
        throw new ArgumentException("Please supply a document url.");
    }
    if (_DirectoryPath.Equals(string.Empty))
    {
        throw new ArgumentException("Please supply a directory.");
    }
    _IsStarted = true;
    /* raise the download starting event. */
    DownloadStarting(this);
    _IsDownloading = true;
    _IsDownloadSuccessful = false;
    Stream stream = null;
    FileStream fstream = null;

    try
    {
        string destFileName = _DirectoryPath + "\\" + FileName;
        destFileName = destFileName.Replace("/", 
                          " ").Replace("%20", " ");

        if (File.Exists(destFileName) == false)
        {
            HttpWebRequest request = 
               (HttpWebRequest)WebRequest.Create(_DocumentUrl);
            HttpWebResponse response = 
               (HttpWebResponse)request.GetResponse();
            stream = response.GetResponseStream();

            byte[] inBuffer = ReadFully(stream, 32768);

            fstream = new FileStream(destFileName, 
                      FileMode.OpenOrCreate, FileAccess.Write);
            fstream.Write(inBuffer, 0, inBuffer.Length - 1);


            fstream.Close();
            stream.Close();
        }
        _IsDownloadSuccessful = true;
        _IsDownloading = false;
        /* raise a download completed event. */
        DownloadCompleted(this, _IsDownloadSuccessful);
    }
    catch
    {
        _IsDownloadSuccessful = false;
    }
    finally
    {
        if (fstream != null)
        {
            fstream.Close();
        }
        if (stream != null)
        {
            stream.Close();
        }
    }
}

Let's see how this all hangs together from the UI perspective. The Get Files button takes in the given URL and retrieves all files that match the Target Filter.

The available files are displayed in the listbox below. The user will then select the files that he/she wants to download and click the Download button. The files start downloading by spawning a new download thread until the maximum number of threads is used up. Any pending files are queued up until a thread finishes downloading and is available again.

The overall download progress is displayed in the progress bar, and each file's download status is displayed in the status column next to it. The entire download operation can be cancelled by clicking on the Cancel All button.

One last point. You will notice that the URL textbox "remembers" your last URL between application restarts.

This feature is dependent upon a persistence library that I wrote in this article. If you need persistence for any other textbox, then just put the text "persist" in the Tag property.

Points of Interest

I learnt a lot about multi-threaded UI programming in this article. It is not easy updating the main UI thread from multiple executing threads if you do not have planned multi-threaded access. This was especially painful when programming the progress bar, and appropriate code locks in the right places in code helped a great deal. I did not want to end up in a situation where unnecessary locks would slow down my code, so I erred on the side of caution, and improved the code incrementally until the UI was behaving consistently.

Revisions

Thank you everyone for your suggestions. I have taken your advice onboard, and have released a new version with the following additions/fixes:

  • Fixed the missing byte problem (tested).
  • Added proxy server authentication (un-tested). It would be great if readers with a proxy server could test that out please and post comments in the discussion section.
  • todo - allow FTP downloads (next version).

History

  • 21/May/2006 - Initial version.
  • 04/June/2006 - Revision 1.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Shailen Sukul
Web Developer
Australia Australia
No Biography provided

Comments and Discussions

 
QuestionDownloader Pinmembernarendra.dwfs20-Mar-12 8:09 
Generalyou are funny man ... PinmemberTheDevelopper23-Sep-10 22:43 
GeneralNamespace errors building the latest source code PinmemberDavid Marcionek25-Sep-09 6:03 
Generalexecution of download manager Pinmembersureshkg6-Aug-09 21:16 
GeneralLatest Code PinmemberShailenSukul27-Feb-09 0:29 
Generalnot found http://www.codeplex.com/bloodhound Pinmemberalhambra-eidos25-Feb-09 1:43 
GeneralRe: not found http://www.codeplex.com/bloodhound PinmemberShailenSukul27-Feb-09 0:31 
GeneralRe: not found http://www.codeplex.com/bloodhound Pinmemberalhambra-eidos2-Mar-09 21:36 
GeneralRe: not found http://www.codeplex.com/bloodhound PinmemberShailenSukul3-Mar-09 15:35 
Generalhelp..i want to download PinmemberMember 463768023-Feb-09 23:26 
GeneralRe: help..i want to download Pinmemberalhambra-eidos25-Feb-09 1:45 
GeneralRe: help..i want to download PinmemberShailenSukul27-Feb-09 0:31 
GeneralBUG fstream.Write PinmemberThanks for all the fish2-Oct-07 6:36 
GeneralRe: BUG fstream.Write PinmemberShane Sukul27-Nov-07 12:03 
GeneralPersistenceHelper Problems PinmemberBitLord Developer Network26-Aug-07 7:20 
GeneralRe: PersistenceHelper Problems PinmemberShane Sukul26-Aug-07 18:17 
GeneralRe: PersistenceHelper Problems Pinmemberf123d11-Sep-07 20:25 
GeneralRe: PersistenceHelper Problems PinmemberMichael Sync9-Oct-07 6:10 
QuestionError in Opening Source Project PinmemberRanjan.D13-Aug-07 0:51 
AnswerRe: Error in Opening Source Project PinmemberShane Sukul26-Aug-07 18:18 
Questionsave the source code Pinmembermenakerman21-Mar-07 3:24 
AnswerRe: save the source code Pinmemberbhlola2-Aug-07 0:08 
GeneralRe: save the source code PinmemberShane Sukul26-Aug-07 18:19 
Questioncan this download from a webdav server? Pinmemberroychoo16-Mar-07 2:22 
AnswerRe: can this download from a webdav server? PinmemberShane Sukul26-Aug-07 18:20 
GeneralBad Component Design Pinmemberleenux_tr1-Mar-07 1:39 
GeneralRe: Bad Component Design PinmemberShailen Sukul1-Mar-07 11:33 
GeneralRe: Bad Component Design Pinmembereatitanddie18-Sep-07 9:02 
GeneralRe: Bad Component Design PinmemberShailenSukul3-Mar-09 16:15 
GeneralResume Download PinmemberSmith55js30-Jan-07 12:16 
NewsRe: Resume Download PinmemberShailen Sukul30-Jan-07 13:07 
GeneralProxy Authentication Pinmemberb_basa22-Nov-06 21:39 
GeneralRe: Proxy Authentication PinmemberShane Sukul22-Nov-06 23:15 
GeneralRe: Proxy Authentication Pinmemberb_basa23-Nov-06 5:10 
GeneralProxy tested well. Pinmemberquangnm18-Sep-06 17:28 
GeneralRe: Proxy tested well. PinmemberShailen Sukul20-Sep-06 1:42 
QuestionOut of memory exception -> use file buffer? Pinmemberf.vanvugt1-Aug-06 1:44 
AnswerRe: Out of memory exception -> use file buffer? Pinmembernov15cn9-Aug-06 5:11 
GeneralRe: Out of memory exception -> use file buffer? Pinmemberf.vanvugt15-Aug-06 0:24 
AnswerRe: Out of memory exception -> use file buffer? PinmemberShailen Sukul20-Sep-06 1:52 
GeneralRe: Out of memory exception -> use file buffer? Pinmemberf.vanvugt20-Sep-06 2:36 
AnswerRe: Out of memory exception -&gt; use file buffer? PinmemberMember 463768026-Feb-09 21:22 
GeneralRe: Out of memory exception -&gt; use file buffer? Pinmemberf.vanvugt1-Mar-09 22:18 
GeneralRe: Out of memory exception -&gt; use file buffer? PinmemberMember 463768010-Apr-09 5:11 
QuestionQuestion PinmemberMarekPaul2310-Jul-06 12:28 
AnswerRe: Question PinmemberShane Sukul26-Aug-07 18:22 
QuestionPersistenceHelper.dll is private? PinmemberBDisp20-Jun-06 9:41 
AnswerRe: PersistenceHelper.dll is private? PinmemberShailen Sukul6-Jul-06 12:06 
GeneralRe: PersistenceHelper.dll is private? Pinmembertkwork17-May-07 11:36 
GeneralNew Version PinmemberShailen Sukul4-Jun-06 19:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.141015.1 | Last Updated 5 Jun 2006
Article Copyright 2006 by Shailen Sukul
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid