Introduction
In this article we'll see how to download files off the web. This is
accomplished without too much effort using the WebRequest and the
WebResponse
classes. These classes offer methods that allow us to access the data from the
web as a stream. Thus we can use any of the various reader/writer classes
available for handling streams. There are two mechanisms that we can use for
downloading files. For small files we can use synchronous mechanism and for
large files or files that are downloaded from servers whose response times
cannot be predicted we can use asynchronous mechanism. I'll demonstrate both
methods in this article.
Synchronous download
void DownloadFile(String* url, String* fpath)
{
WebRequest* wrq = WebRequest::Create(url);
HttpWebResponse* hwr = static_cast<HttpWebResponse*>(wrq->GetResponse());
Stream* strm = hwr->GetResponseStream();
FileStream* fs = new FileStream(fpath,FileMode::Create,FileAccess::Write);
BinaryWriter* br = new BinaryWriter(fs);
int b;
while((b=strm->ReadByte()) != -1)
{
br->Write(Convert::ToByte(b));
}
br->Close();
strm->Close();
}
I've used five classes there in quick succession. I guess that's just what
the BCL is all about, a lavish abundance of classes.
WebRequest is an abstract class that allows an user to request internet
data in a protocol independent manner. We use the
static method Create to request our file. The WebRequest class has
a method called GetResponse which returns a WebResponse
object. Since in our particular case, we have requested for an HTTP file, we
cast our WebResponse object to an HttpWebResponse
object. One big advantage of using these classes is that they all allow us
stream access. In our case the HttpWebResponse class has a
GetResponseStream method that returns a Stream object that
encapsulates the requested file from the web. The rest of it is simple if you
have used streams before. If not, you can read my article on files and streams
here on CP. We simply read from the stream returned by the HttpWebResponse
object and write the data to a file.
Asynchronous download
This is a little bit more complicated than synchronous downloads. But then,
as you might expect when you are downloading several large files, then this is
the more efficient method. I vaguely remember someone from MS saying that
asynchronous methods use high performance techniques like I/O completion ports
internally.
We create our WebRequest object just as we did above, but
instead of calling GetResponse, we call BeginGetResponse
which begins an asynchronous request for an Internet resource. We specify a
response callback function as one of the arguments. We then wait on a
ManualResetEvent object which is set by the callback, so that our
function will be able to block using a wait call till the entire response is
read and stored. We also pass our WebRequest object as the state
object for the callback function.
void DownloadFileAsync(String* url, String* fpath)
{
WebRequest* wrq = WebRequest::Create(url);
finished = new ManualResetEvent(false);
m_writeEvent = new AutoResetEvent(true);
buffer = new unsigned char __gc[512];
OutFile = new FileStream(fpath,
FileMode::Create,FileAccess::Write);
wrq->BeginGetResponse(
new AsyncCallback(this,WebStuffDemo::ResponseCallback),
wrq);
finished->WaitOne();
OutFile->Close();
}
Response callback
void ResponseCallback(IAsyncResult* ar)
{
WebRequest* wrq = static_cast<WebRequest*>(ar->AsyncState);
WebResponse* wrp = wrq->EndGetResponse(ar);
Stream* strm = wrp->GetResponseStream();
strm->BeginRead(buffer,0,512,
new AsyncCallback(this,WebStuffDemo::ReadCallBack),strm);
}
The EndGetResponse method concludes the asynchronous request
that was initiated using the BeginGetResponse method and returns a
WebResponse object from which we can use GetResponseStream
to get the underlying stream object. Now we begin our next asynchronous
operation on the stream. We start an asynchronous read operation using
BeginRead. If you are wondering why we do this, here is a snip from MSDN.
"Using synchronous calls in asynchronous callback methods may result in
severe performance penalties. Internet requests made with WebRequest and its
descendents must use Stream.BeginRead to read the stream returned by the
WebResponse.GetResponseStream method"
Read callback
void ReadCallBack(IAsyncResult* ar)
{
Stream* strm = static_cast<Stream*>(ar->AsyncState);
int count = strm->EndRead(ar);
if(count > 0)
{
__wchar_t Temp __gc[] = new __wchar_t __gc[512];
Decoder* d = Encoding::UTF8->GetDecoder();
d->GetChars(buffer,0,buffer->Length,Temp,0);
String* s = new String(Temp,0,count);
Console::WriteLine(s->Length);
unsigned char wbuff __gc[] = new unsigned char __gc[512];
buffer->CopyTo(wbuff,0);
OutFile->BeginWrite(wbuff,0,count,
new AsyncCallback(this,WebStuffDemo::WriteCallBack),OutFile);
strm->BeginRead(buffer,0,512,
new AsyncCallback(this,WebStuffDemo::ReadCallBack),strm);
}
else
{
strm->Close();
finished->Set();
}
}
We call EndRead on the stream and get back the count of bytes
that were read from the stream. EndRead is a blocking call and is
to be called once per BeginRead call we have initiated already. If
the count of bytes read is greater than zero, then there is more data left.
Otherwise we know that all the data has arrived and we close the stream and also
set the event on which our main function is waiting. Just as we had to use
asynchronous methods to read the data, we must use asynchronous methods
for writing the data to our file, otherwise we'll have blocking calls inside the
asynchronous callback functions which is highly inefficient.
So what we do is we call the BeginRead method on our output
stream object. We pass our write-callback function as the callback, and pass the
output stream object as the callback function's state object. Once we do this we
call BeginRead on our input stream object to start another
asynchronous read, as there is still more data left to be retrieved.
Write callback
void WriteCallBack(IAsyncResult* ar)
{
m_writeEvent->WaitOne();
FileStream* out = static_cast<FileStream*>(ar->AsyncState);
out->EndWrite(ar);
m_writeEvent->Set();
}
We call EndWrite on our output stream which ends an asynchronous
write operation started by BeginWrite. EndWrite blocks
till all the data has been written to. Thus we are saved the bother of making
sure that all the data has got written. As you can see, I have use an
AutoResetEvent object to make sure that two writes don't occur in
parallel and also to ensure that the writes are called in the correct order. If
multiple write callbacks are invoked, they'll all hang at the WaitOne
call and when they are executed, they'll get executed in the order in which they
called WaitOne.
Nish is a real nice guy who has been writing code since 1990 when he first got his hands on an 8088 with 640 KB RAM. Originally from sunny Trivandrum in India, he has been living in various places over the past few years and often thinks it’s time he settled down somewhere.
Nish has been a Microsoft Visual C++ MVP since October, 2002 - awfully nice of Microsoft, he thinks. He maintains an MVP tips and tricks web site -
www.voidnish.com where you can find a consolidated list of his articles, writings and ideas on VC++, MFC, .NET and C++/CLI. Oh, and you might want to check out his blog on C++/CLI, MFC, .NET and a lot of other stuff -
blog.voidnish.com.
Nish loves reading Science Fiction, P G Wodehouse and Agatha Christie, and also fancies himself to be a decent writer of sorts. He has authored a romantic comedy
Summer Love and Some more Cricket as well as a programming book –
Extending MFC applications with the .NET Framework.
Nish's latest book
C++/CLI in Action published by Manning Publications is now available for purchase. You can read more about the book on his blog.
Despite his wife's attempts to get him into cooking, his best effort so far has been a badly done omelette. Some day, he hopes to be a good cook, and to cook a tasty dinner for his wife.