BCBCurl, a LibCurl based download manager






4.65/5 (7 votes)
How to embed LibCurl to create a download manager.
Introduction
Curl is a great command line tool for data transfer with URL syntax. It support FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, and many more. It's library (LibCurl) is widely used in many project.
In this article I try to present my method of embedding LibCurl in a simple download manager application named BCBCurl.
Compiler used: Embarcadero C++ Builder XE3.
Background
Previously I use curl.exe command line tool and FreeDownloadManager for my downloading activity. Each with their own strength and weakness.
The main reason I develop this program is because I need a download manager with features presented in this table:
Feature supported | curl | FDM | BCBCurl |
---|---|---|---|
Set URL from command line | YES | YES | YES |
Set output file from command line | YES | NO | YES |
Set referer from command line | YES | NO | YES |
Set cookie from command line | YES | NO | YES |
Auto resume broken download | NO | YES |
YES |
Download queing | NO | YES | YES |
Multi thread download | NO | YES | TODO |
Socks protocol | YES | NO | TODO |
Using the code
The program has two thread, the main thread responsible for user interface and the worker thread for downloading. I use "easy" libcurl API for downloading. It's a blocking socket, so I separated it in a different thread. Messages from downloader thread passed to main thread using a FIFO list. LibCurl actually has nonblocking functions, but I haven't tried to use them though.
BCBCurl can run in two different modes, the first it can run as a server that do the actual downloading and stay active, second as a client that only receive task from command line and send them to the server's downloading queue then exit. Command sent from client to server via a shared memory. This option is usefull when you call BCBCurl from a script, and you have a choice between wait for the download to finish or simply put it in a que.
Here's the function that invokes LibCurl downloading mechanism:
(it's based on LibCurl download in memory sample code)
int do_curl(TThreadCurl *chunk, unsigned long range_from, unsigned long range_to, int headeronly) { chunk->status = CURL_STARTED; CURL *curl_handle; CURLcode res; curl_handle = curl_easy_init(); /* specify URL to get */ curl_easy_setopt(curl_handle, CURLOPT_URL, chunk->url.c_str()); if (chunk->referer != "") { curl_easy_setopt(curl_handle, CURLOPT_REFERER, chunk->referer.c_str()); } if (chunk->cookie != "") { curl_easy_setopt(curl_handle, CURLOPT_COOKIE, chunk->cookie.c_str()); } // curl_easy_setopt(curl_handle, CURLOPT_TIMEOUT, 60); curl_easy_setopt(curl_handle, CURLOPT_LOW_SPEED_LIMIT, 1); if (chunk->invalid) { chunk->status = CURL_TERMINATED; return 0; } int len; char *unescaped = curl_easy_unescape(curl_handle, chunk->url.c_str(), chunk->url.Length(), &len); chunk->unescaped = unescaped; curl_free(unescaped); /* send header data to this function */ curl_easy_setopt(curl_handle, CURLOPT_HEADERFUNCTION, header_callback); curl_easy_setopt(curl_handle, CURLOPT_HEADERDATA, (void*) chunk); curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYPEER, 0L); curl_easy_setopt(curl_handle, CURLOPT_SSL_VERIFYHOST, 0L); /* send content to this function */ curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback); /* we pass our 'chunk' struct to the callback function */ curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void*) chunk); if(headeronly > 0) { curl_easy_setopt(curl_handle, CURLOPT_HEADER, 1); curl_easy_setopt(curl_handle, CURLOPT_NOBODY, 1); } /* range download, FROM-TO byte */ if ((range_from + range_to) > 0) { AnsiString srange = AnsiString().sprintf("%ld-%ld", range_from, range_to); msglist->Add("\r\n---\r\nDownload range: " + srange); curl_easy_setopt(curl_handle, CURLOPT_RANGE, srange.c_str()); chunk->start_byte = range_from; chunk->last_byte = range_from; chunk->stop_byte = range_to; chunk->size_downloaded = range_from; } else if (chunk->stop_byte > chunk->last_byte) { AnsiString autorange = AnsiString().sprintf("%ld-%ld", chunk->last_byte, chunk->stop_byte); msglist->Add("Autorange: " + AnsiString(autorange)); curl_easy_setopt(curl_handle, CURLOPT_RANGE, autorange.c_str()); chunk->last_byte = chunk->last_byte; chunk->stop_byte = chunk->stop_byte; chunk->size_downloaded = chunk->last_byte; } else if (chunk->stop_byte > 0 && chunk->last_byte > 0) { msglist->Add("nothing to do"); chunk->status = CURL_TERMINATED; return 0; } /* some servers don't like requests that are made without a user-agent field, so we provide one */ curl_easy_setopt(curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4"); /* get it! */ /* perform curl download */ res = curl_easy_perform(curl_handle); /* check for errors */ if(res != CURLE_OK) { msglist->Add("Err: " + AnsiString(curl_easy_strerror(res))); chunk->status = CURL_TERMINATED; } /* cleanup curl stuff */ curl_easy_cleanup(curl_handle); chunk->status = CURL_TERMINATED; return res; }
msglist
is a TList
object used as FIFO buffer to send message to the server.
chunk
(TThreadCurl
object) contains chunk of data downloaded from web server and some download parameters. The memory
variable allocated dynamicaly according to Content Length header.
class TThreadCurl : public TThread { public: char *memory; // to store downloaded data size_t size_downloaded; // size of downloaded data of a single do_curl call size_t size_memory; // total data download (from all do_curl call) unsigned long content_length; // content length of data from header unsigned long start_byte; // mark of starting point of download unsigned long stop_byte; // mark of ending point of download unsigned long last_byte; // last position of download int can_resume; // set if download can resume int invalid; // set if error encountered during download AnsiString url; // download URL AnsiString unescaped; // unescaped string of URL AnsiString referer; // referer from browser AnsiString cookie; // cookie from browser int status; // download status // char *str_status[] = {"[UNKNOWN 0]", "[CURL STARTED]", "[CURL RECEIVING]", "[CURL TERMINATED]", "[UNKNOWN 5]"}; int http_code; // http return code int download_step; // progress of download ... }
referer
and cookie
can be set from command line parameter if BCBCurl called from a browser. I use Flashgot addon to call BCBCurl from within Firefox.
Libcurl download process use two function:
1. header_callback()
to process header from the web server.
2. WriteMemoryCallback()
to capture downloaded content from webserver.
header_callback function:
This function will be called when LibCurl downloading header data from web server.
size_t header_callback(char *buffer, size_t size, size_t nmemb, void *userdata) { TPerlRegEx *pcre = new TPerlRegEx(); TThreadCurl *curl = (TThreadCurl *)userdata; pcre->RegEx = "(ACCEPT-RANGE)"; pcre->Options = TPerlRegExOptions() << preCaseLess; pcre->Subject = AnsiString(buffer);//.UpperCase(); if (pcre->Match()) { msglist->Add(buffer); curl->can_resume = 1; } pcre->RegEx = "HTTP\\/.*\\s*(\\d\\d\\d)"; pcre->Options = TPerlRegExOptions() << preCaseLess; pcre->Subject = AnsiString(buffer);//.UpperCase(); if (pcre->Match()) { curl->http_code = StrToInt(pcre->Groups[1]); if (curl->http_code > 206) { curl->invalid = 1;// cancel download } } pcre->RegEx = "Content-Disposition:"; pcre->Options = TPerlRegExOptions() << preCaseLess; pcre->Subject = AnsiString(buffer);//.UpperCase(); if (pcre->Match()) { pcre->RegEx = "Content-Disposition:.*filename=[\"\'](.*)[\"\']"; pcre->Options = TPerlRegExOptions() << preCaseLess; pcre->Subject = AnsiString(buffer);//.UpperCase(); if (pcre->Match()) { curl->header_filename = pcre->Groups[1]; } else { pcre->RegEx = "Content-Disposition:.*filename=\\s*([^\\s]+)[\\s$]"; pcre->Options = TPerlRegExOptions() << preCaseLess; pcre->Subject = AnsiString(buffer);//.UpperCase(); if (pcre->Match()) { curl->header_filename = pcre->Groups[1]; } } } pcre->RegEx = "CONTENT-LENGTH\\s*:\\s*(\\d+)"; pcre->Options = TPerlRegExOptions() << preCaseLess; pcre->Subject = AnsiString(buffer);//.UpperCase(); if (pcre->Match()) { msglist->Add(" --- CONTENT LENGTH: " + pcre->Groups[1]); unsigned long ctlen = StrToInt(pcre->Groups[1]); // CHECK HTTP PROTOCOL, WHY SUBSEQUENT CONTENT LENGTH REDUCED WITH DOWNLOADED SIZE if (ctlen > curl->content_length) { curl->content_length = ctlen; } size_t total_chunk_size = curl->content_length + 1; curl->memory = (char *) realloc(curl->memory, total_chunk_size); curl->size_memory = total_chunk_size; curl->stop_byte = curl->content_length; if(curl->memory == NULL) { /* out of memory! */ msglist->Add("not enough memory (realloc returned NULL)\n"); return 0; } } size_t realsize = size * nmemb; return realsize; }
I need to parse header data from web server to get Content-Length, (optionaly) file name, HTTP response code, and to check wether the server support resume downloading. Notice how I use perl Regular Expression to parse header data. I know I can parse text using other method, but once being a perl programmer I just can't live without regex :) .
WriteMemoryCallback function :
This function will be called when LibCurl downloading actual data from web server. It's mainly capturing data from function parameter and store them in memory on the right position.
size_t WriteMemoryCallback(void *contents, size_t size, size_t nmemb, void *userp) { size_t realsize = size * nmemb; TThreadCurl *curl = (TThreadCurl *)userp; curl->status = CURL_RECEIVING; size_t total_chunk_size = curl->size_downloaded + realsize + 1; if (curl->size_memory < total_chunk_size) { curl->memory = (char *) realloc(curl->memory, total_chunk_size); curl->size_memory = total_chunk_size; } if(curl->memory == NULL) { /* out of memory! */ msglist->Add("not enough memory (realloc returned NULL)\n"); return 0; } unsigned long offset = curl->size_downloaded > 0 ? curl->size_downloaded : 0; memcpy(&(curl->memory[offset]), contents, realsize); curl->last_byte = curl->last_byte + realsize; // msglist->Add("last byte: " + AnsiString(curl->last_byte) + "( " + AnsiString(100 * curl->last_byte / (curl->content_length > 0 ? curl->content_length : 1000)) + "% )"); curl->size_downloaded += realsize; Application->ProcessMessages(); return realsize; }
Command line parameter
List of command line parameter processed by BCBCurl:
- -o [explicit output file name]
- -f [suggested filename]
- -d [output folder]
- -k [cookie]
- -r [referer]
- -m [comment]
- -c (no value, set if want to run BCBCurl as client only)
Browser integration
To integrate BCBCurl with browser i use FlashGot addon with following parameter:
-c [URL] [-r REFERER] [-k COOKIE] [-m COMMENT] [-f FNAME]
Points of Interest
Things I learn by making this program are:
- How to use LibCurl in desktop application.
- How to parse command line parameter.
- How to use RegExp in C++ application.
- How to pass data via shared memory.
- How to call BCBCurl from browser.
I realize I haven't explain them completely to make this article short, but I will gladly explain if anyone asking.
BCBCurl executable is included in this article. As for source code you can get it from https://bitbucket.org/pamungkas5/bcbcurl/