BCBCurl, a LibCurl based download manager

Whay Kay Pee

4.65/5 (7 votes)

Nov 5, 2014

CPOL

3 min read

24723

1078

How to embed LibCurl to create a download manager.

Download bcbcurl_executable.zip - 2.9 MB

Source code available at https://bitbucket.org/pamungkas5/bcbcurl/

Introduction

Curl is a great command line tool for data transfer with URL syntax. It support FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, and many more. It's library (LibCurl) is widely used in many project.
In this article I try to present my method of embedding LibCurl in a simple download manager application named BCBCurl.

Compiler used: Embarcadero C++ Builder XE3.

Background

Previously I use curl.exe command line tool and FreeDownloadManager for my downloading activity. Each with their own strength and weakness.

The main reason I develop this program is because I need a download manager with features presented in this table:

Feature supported	curl	FDM	BCBCurl
Set URL from command line	YES	YES	YES
Set output file from command line	YES	NO	YES
Set referer from command line	YES	NO	YES
Set cookie from command line	YES	NO	YES
Auto resume broken download	NO	YES	YES
Download queing	NO	YES	YES
Multi thread download	NO	YES	TODO
Socks protocol	YES	NO	TODO

Using the code

The program has two thread, the main thread responsible for user interface and the worker thread for downloading. I use "easy" libcurl API for downloading. It's a blocking socket, so I separated it in a different thread. Messages from downloader thread passed to main thread using a FIFO list. LibCurl actually has nonblocking functions, but I haven't tried to use them though.

BCBCurl can run in two different modes, the first it can run as a server that do the actual downloading and stay active, second as a client that only receive task from command line and send them to the server's downloading queue then exit. Command sent from client to server via a shared memory. This option is usefull when you call BCBCurl from a script, and you have a choice between wait for the download to finish or simply put it in a que.

Here's the function that invokes LibCurl downloading mechanism:
(it's based on LibCurl download in memory sample code)

int do_curl(TThreadCurl *chunk, unsigned long range_from, unsigned long range_to, int headeronly) {
    chunk->status = CURL_STARTED;
  CURL *curl_handle;
  CURLcode res;
  curl_handle = curl_easy_init();

  /* specify URL to get */
  curl_easy_setopt(curl_handle, CURLOPT_URL, chunk->url.c_str());

  if (chunk->referer != "") {
    curl_easy_setopt(curl_handle, CURLOPT_REFERER, chunk->referer.c_str());
  }

  if (chunk->cookie != "") {
    curl_easy_setopt(curl_handle, CURLOPT_COOKIE, chunk->cookie.c_str());
  }

//  curl_easy_setopt(curl_handle, CURLOPT_TIMEOUT, 60);
    curl_easy_setopt(curl_handle, CURLOPT_LOW_SPEED_LIMIT, 1);

    if (chunk->invalid) {
        chunk->status = CURL_TERMINATED;
        return 0;
    }

    int len;
    char *unescaped = curl_easy_unescape(curl_handle, chunk->url.c_str(), chunk->url.Length(), &len);
    chunk->unescaped = unescaped;
    curl_free(unescaped);

  /* send header data to this function  */
    curl_easy_setopt(curl_handle, CURLOPT_HEADERFUNCTION, header_callback);
    curl_easy_setopt(curl_handle, CURLOPT_HEADERDATA, (void*) chunk);

    curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYPEER, 0L);
    curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYHOST, 0L);

  /* send content to this function  */
  curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);

  /* we pass our 'chunk' struct to the callback function */
  curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void*) chunk);

    if(headeronly > 0) {
        curl_easy_setopt(curl_handle, CURLOPT_HEADER, 1);
        curl_easy_setopt(curl_handle, CURLOPT_NOBODY, 1);
    }

    /* range download, FROM-TO byte */
    if ((range_from + range_to) > 0) {
        AnsiString srange = AnsiString().sprintf("%ld-%ld", range_from, range_to);
        msglist->Add("\r\n---\r\nDownload range: " + srange);
        curl_easy_setopt(curl_handle, CURLOPT_RANGE, srange.c_str());
        chunk->start_byte = range_from;
        chunk->last_byte = range_from;
        chunk->stop_byte = range_to;
        chunk->size_downloaded = range_from;
    }
    else
    if (chunk->stop_byte > chunk->last_byte) {
        AnsiString autorange = AnsiString().sprintf("%ld-%ld", chunk->last_byte, chunk->stop_byte);
        msglist->Add("Autorange: " + AnsiString(autorange));
        curl_easy_setopt(curl_handle, CURLOPT_RANGE, autorange.c_str());
        chunk->last_byte = chunk->last_byte;
        chunk->stop_byte = chunk->stop_byte;
        chunk->size_downloaded = chunk->last_byte;
    }
    else
    if (chunk->stop_byte > 0 && chunk->last_byte > 0) {
        msglist->Add("nothing to do");
        chunk->status = CURL_TERMINATED;
        return 0;
    }
  /* some servers don't like requests that are made without a user-agent
     field, so we provide one */
  curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4");

  /* get it! */
    /* perform curl download */
  res = curl_easy_perform(curl_handle);

  /* check for errors */
  if(res != CURLE_OK) {
    msglist->Add("Err: " + AnsiString(curl_easy_strerror(res)));
    chunk->status = CURL_TERMINATED;
  }
  

  /* cleanup curl stuff */
  curl_easy_cleanup(curl_handle);

    chunk->status = CURL_TERMINATED;
    return res;
}

msglist is a TList object used as FIFO buffer to send message to the server.

chunk (TThreadCurl object) contains chunk of data downloaded from web server and some download parameters. The memory variable allocated dynamicaly according to Content Length header.

class TThreadCurl : public TThread
{
public:
  char *memory; // to store downloaded data
  size_t size_downloaded; // size of downloaded data of a single do_curl call
  size_t size_memory; // total data download (from all do_curl call)
  unsigned long content_length; // content length of data from header
  unsigned long start_byte; // mark of starting point of download
  unsigned long stop_byte; // mark of ending point of download
  unsigned long last_byte; // last position of download
  int can_resume; // set if download can resume
  int invalid; // set if error encountered during download
  AnsiString url; // download URL
  AnsiString unescaped; // unescaped string of URL
  AnsiString referer; // referer from browser
  AnsiString cookie; // cookie from browser
  int status; // download status
// char *str_status[] = {"[UNKNOWN 0]", "[CURL STARTED]", "[CURL RECEIVING]", "[CURL TERMINATED]", "[UNKNOWN 5]"};
  int http_code; // http return code
  int download_step; // progress of download
...
}

referer and cookie can be set from command line parameter if BCBCurl called from a browser. I use Flashgot addon to call BCBCurl from within Firefox.

Libcurl download process use two function:

1. header_callback() to process header from the web server.

2. WriteMemoryCallback() to capture downloaded content from webserver.

header_callback function:

This function will be called when LibCurl downloading header data from web server.

size_t header_callback(char *buffer,   size_t size,   size_t nmemb,   void *userdata)
{
    TPerlRegEx *pcre = new TPerlRegEx();
    TThreadCurl *curl = (TThreadCurl *)userdata;
    pcre->RegEx = "(ACCEPT-RANGE)";
    pcre->Options = TPerlRegExOptions() << preCaseLess;
    pcre->Subject = AnsiString(buffer);//.UpperCase();
    if (pcre->Match()) {
        msglist->Add(buffer);
        curl->can_resume = 1;
    }

    pcre->RegEx = "HTTP\\/.*\\s*(\\d\\d\\d)";
    pcre->Options = TPerlRegExOptions() << preCaseLess;
    pcre->Subject = AnsiString(buffer);//.UpperCase();
    if (pcre->Match()) {
        curl->http_code = StrToInt(pcre->Groups[1]);
        if (curl->http_code > 206) {
            curl->invalid = 1;// cancel download
        }
    }

    pcre->RegEx = "Content-Disposition:";
    pcre->Options = TPerlRegExOptions() << preCaseLess;
    pcre->Subject = AnsiString(buffer);//.UpperCase();
    if (pcre->Match()) {
        pcre->RegEx = "Content-Disposition:.*filename=[\"\'](.*)[\"\']";
        pcre->Options = TPerlRegExOptions() << preCaseLess;
        pcre->Subject = AnsiString(buffer);//.UpperCase();
        if (pcre->Match()) {
            curl->header_filename = pcre->Groups[1];
        } else {
            pcre->RegEx = "Content-Disposition:.*filename=\\s*([^\\s]+)[\\s$]";
            pcre->Options = TPerlRegExOptions() << preCaseLess;
            pcre->Subject = AnsiString(buffer);//.UpperCase();
            if (pcre->Match()) {
                curl->header_filename = pcre->Groups[1];
            }
        }
    }

    pcre->RegEx = "CONTENT-LENGTH\\s*:\\s*(\\d+)";
    pcre->Options = TPerlRegExOptions() << preCaseLess;
    pcre->Subject = AnsiString(buffer);//.UpperCase();
    if (pcre->Match()) {
        msglist->Add(" --- CONTENT LENGTH: " + pcre->Groups[1]);
        unsigned long ctlen = StrToInt(pcre->Groups[1]);

        // CHECK HTTP PROTOCOL, WHY SUBSEQUENT CONTENT LENGTH REDUCED WITH DOWNLOADED SIZE
        if (ctlen > curl->content_length) {
            curl->content_length = ctlen;
        }
        size_t total_chunk_size = curl->content_length + 1;
        curl->memory = (char *) realloc(curl->memory, total_chunk_size);
        curl->size_memory = total_chunk_size;
        curl->stop_byte = curl->content_length;
        if(curl->memory == NULL) {
        /* out of memory! */
            msglist->Add("not enough memory (realloc returned NULL)\n");
        return 0;
      }
    }
    size_t realsize = size * nmemb;
    return  realsize;
}

I need to parse header data from web server to get Content-Length, (optionaly) file name, HTTP response code, and to check wether the server support resume downloading. Notice how I use perl Regular Expression to parse header data. I know I can parse text using other method, but once being a perl programmer I just can't live without regex :) .

WriteMemoryCallback function :

This function will be called when LibCurl downloading actual data from web server. It's mainly capturing data from function parameter and store them in memory on the right position.

size_t WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
  size_t realsize = size * nmemb;
  TThreadCurl *curl = (TThreadCurl *)userp;
  curl->status = CURL_RECEIVING;
  size_t total_chunk_size = curl->size_downloaded + realsize + 1;

  if (curl->size_memory < total_chunk_size) {
    curl->memory = (char *) realloc(curl->memory, total_chunk_size);
    curl->size_memory = total_chunk_size;
  }

  if(curl->memory == NULL) {
    /* out of memory! */
    msglist->Add("not enough memory (realloc returned NULL)\n");
    return 0;
  }
  unsigned long offset = curl->size_downloaded > 0 ? curl->size_downloaded : 0;
  memcpy(&(curl->memory[offset]), contents, realsize);
  curl->last_byte = curl->last_byte + realsize;
//  msglist->Add("last byte: " + AnsiString(curl->last_byte) + "( " + AnsiString(100 * curl->last_byte / (curl->content_length > 0 ? curl->content_length : 1000)) + "% )");
  curl->size_downloaded += realsize;
  Application->ProcessMessages();
  return realsize;
}

Command line parameter

List of command line parameter processed by BCBCurl:

-o [explicit output file name]
-f [suggested filename]
-d [output folder]
-k [cookie]
-r [referer]
-m [comment]
-c (no value, set if want to run BCBCurl as client only)

Browser integration

To integrate BCBCurl with browser i use FlashGot addon with following parameter:

-c [URL] [-r REFERER] [-k COOKIE] [-m COMMENT] [-f FNAME]

Points of Interest

Things I learn by making this program are:

How to use LibCurl in desktop application.
How to parse command line parameter.
How to use RegExp in C++ application.
How to pass data via shared memory.
How to call BCBCurl from browser.

I realize I haven't explain them completely to make this article short, but I will gladly explain if anyone asking.

BCBCurl executable is included in this article. As for source code you can get it from https://bitbucket.org/pamungkas5/bcbcurl/