Copy a Stream with Progress Reporting






4.91/5 (35 votes)
This article describes how to copy a stream into another and how to track comprehensive progress information of the copy operation.
- Download console demo project - 25.5 KB
- Download WPF demo project (asynchronous progress report) - 42.8 KB
- Download source - 4.8 KB
Introduction
Copying streams is an often used task, unfortunately the .NET Framework itself does not come with an easy to use class or method for directly copying data from one stream to another and even less with comprehensive progress reporting. But especially when the process of copying takes a while, for example when down or uploading data, detailed progress feedback like average copy speed, current copy speed, or estimated duration is required.
This article describes how to solve these issues and introduces reusable StreamHelper
and ProgressStatistics
classes.
Copy Data from one Stream to another Stream
The first part of this article is about copying binary data without progress tracking.
To avoid loading all data from the source stream into the RAM, a smaller buffer (typically 4096 bytes long) is needed, which shifts the data only chunk by chunk to the target stream. If the whole data would be copied into the RAM at once, an OutOfMemory
exception could be thrown.
In a loop, this buffer is filled with data from the source stream and then written to the target stream until all data is copied. An activity diagram of this procedure is shown on the right.
This task can be implemented with an extension method, so you can call targetStream.CopyFrom(sourceStream)
.
Here is the listing for this extension method called CopyFrom
:
/// <summary>
/// Copies the source stream into the current
/// </summary>
/// <param name="stream">The current stream</param>
/// <param name="source">The source stream</param>
/// <param name="bufferSize">Optional the size of buffer used for copying bytes</param>
/// <returns>The number of bytes actually copied.</returns>
public static long CopyFrom(this Stream stream, Stream source, int bufferSize = 4096)
{
int count = 0;
byte[] buffer = new byte[bufferSize];
long length = 0;
while ((count = source.Read(buffer, 0, bufferSize)) != 0)
{
length += count;
stream.Write(buffer, 0, count);
}
return length;
}
This is the basic technique for copying streams, which will be used in the next chapter.
Copy Data with Progress Reporting
The Progress Change Callback
To report progress, a simple delegate can be used as a callback:
/// <summary>
/// A delegate for reporting binary progress
/// </summary>
/// <param name="bytesRead">The amount of bytes allready read</param>
/// <param name="totalBytesToRead">The amount of total bytes to read. Can be -1 if unknown.</param>
public delegate void ProgressChange(long bytesRead, long totalBytesToRead);
A callback means, that a pointer to a method is passed as argument, so this method can be called inside the actually called method.
Some streams are not seekable and their lengths are unknown until they are read to end, e.g., the
NetworkStream
.
For this purpose, the parameter totalBytesToRead
can be -1 to indicate that the length of the source stream cannot be determined.
The Progress Reporting
To report progress, the progress change callback has to be called in regularly intervals. Since the progress report is for humans, who want a continuously feedback, the interval should be time dependent, i.e. a timespan. It is not sense full to use an amount of copied bytes as interval, hence uploading data is a lot slower than copying a local file into RAM - so in the first case the progress would be reported very rarely whereas in the second case very often (per timespan).
Because in the moment of reporting progress changes, no data can be copied, the copying should be done asynchronously to the progress reporting.
The following activity diagram visualizes this idea:
It is important, that the progress change callback is called in the context of the calling thread as this is the expected behavior when using this method. Indeed, both threads are synchronized (in fact, there are no shared resources except one), but some libraries does not want to be accessed by different threads (e.g. WPF).
The only shared resource between both threads is the number of bytes already copied, which is needed for progress calculations. Since this variable is a 64 bit one (as there are files larger than 232 Bytes = 4 Gibibytes) and only read/write accesses to 32 bit variables are atomic in C#, the static Interlocked
class has to be used for synchronizing this resource between both threads.
To abort the copying operation, the progress reporting thread checks a passed WaitHandle
and notifies the copying thread (if the WaitHandle
is set) by setting a flag. As accesses to booleans
in C# are atomic, this variable has not to be synchronized.
If the length of the stream is known through additional meta information, but the stream itself is not seekable, the total length can be passed to the CopyFrom
method to allow proper progress-calculations.
Because the current implementation of CopyFrom
with progress reporting accepts a lot of optionally arguments (total length, buffer size, progress change callback, stop event (wait handle) and progress change callback interval), they have been moved into their own CopyFromArguments
class.
So the method consists of the following:
public static long CopyFrom(this Stream target, Stream source, CopyFromArguments arguments)
{
if (target == null)
throw new ArgumentNullException("target");
if (source == null)
throw new ArgumentNullException("source");
if (arguments == null)
throw new ArgumentNullException("arguments");
if (arguments.BufferSize < 128)
throw new ArgumentOutOfRangeException("arguments.BufferSize",
arguments.BufferSize, "BufferSize has to be greater or equal than 128.");
if (arguments.ProgressChangeCallbackInterval.TotalSeconds < 0)
throw new ArgumentOutOfRangeException("arguments.ProgressChangeCallbackInterval",
arguments.ProgressChangeCallbackInterval,
"ProgressChangeCallbackInterval has to be greater or equal than 0.");
long length = 0;
bool runningFlag = true;
Action<Stream, Stream, int> copyMemory = (Stream _target, Stream _source, int bufferSize) =>
//Raw copy-operation, "length" and "runningFlag" are enclosed as closure
{
int count;
byte[] buffer = new byte[bufferSize];
while ((count = _source.Read(buffer, 0, bufferSize)) != 0 && runningFlag)
{
_target.Write(buffer, 0, count);
long newLength = length + count;
//"length" can be read as this is the only thread which writes to "length"
Interlocked.Exchange(ref length, newLength);
}
};
IAsyncResult asyncResult = copyMemory.BeginInvoke(target, source, arguments.BufferSize, null, null);
long totalLength = arguments.TotalLength;
if (totalLength == -1 && source.CanSeek)
totalLength = (long)source.Length;
DateTime lastCallback = DateTime.Now;
long lastLength = 0;
while (!asyncResult.IsCompleted)
{
if (arguments.StopEvent != null && arguments.StopEvent.WaitOne(0))
runningFlag = false; //to indicate that the copy-operation has to abort
Thread.Sleep((int)(arguments.ProgressChangeCallbackInterval.TotalMilliseconds / 10));
if (arguments.ProgressChangeCallback != null
&& DateTime.Now - lastCallback > arguments.ProgressChangeCallbackInterval)
{
long currentLength = Interlocked.Read(ref length); //Since length is 64 bit, reading is not an atomic operation.
if (currentLength != lastLength)
{
lastLength = currentLength;
lastCallback = DateTime.Now;
arguments.ProgressChangeCallback(currentLength, totalLength);
}
}
}
if (arguments.ProgressChangeCallback != null && lastLength != length)
//to ensure that the callback is called once with maximum progress
arguments.ProgressChangeCallback(length, totalLength);
copyMemory.EndInvoke(asyncResult);
return length;
}
Note that in this implementation the .Net thread pool has been used for threading (copyMemory.BeginInvoke
).
Why not using a Class for that?
While writing this article, I wondered whether I did not followed the object oriented design but used the functional approach for the CopyFrom
method. In my opinion, an OOD solution would only be more complex, both to write and to use. As the copying of streams is more an action than an object and the action has not to be extended in the way of OOD, I think the functional approach is more natural.
Progress statistics
Now the progress can be tracked, but displaying only the number of bytes copied in contrast to the number of bytes
which will be copied, is not enough for most users. For the interesting details like current bytes per second or estimated duration, more calculations are needed.
The good thing is that nothing more than an analysis of the amount of copied
bytes in relation to the amount of bytes which will be copied over the time is
needed. So the previous CopyFrom
method with progress reporting can be used for that, as the callback ships all information needed for further calculations.
The ProgressStatistic class
Because only the callback of the CopyFrom
method has to attached to the class which does the progress calculation and the class even
does not have to know the delegate, both classes (the StreamHelper
and
ProgressStatistic
) are independent from each other and highly reusable.
So the only interface the ProgressStatistic
class has to provide, is a method with a signature matching the progress change delegate.
A class diagram of the statistic class with all its properties is shown on the right.
Since the most part of the statistic class is boilerplate (StartingTime
, FinishingTime
, Progress
, AverageBytesPerSecond
, etc.),
I will deepen only the more difficult part: The calculation of the current bytes per second.
Current bytes per second
In contrast to the (global) average bytes per second, current bytes per second tries to approximate the local average (i.e., the average in a very small interval) of bytes, which would be copied in a second. In principle it is the derivation of the function n(t) = the count of copied bytes over the time. The derivation can be expressed with:
.
But since n(t) is neither differentiable nor continuous, this mathematical equation can only be approximated through simple gradient triangles:
As n'(t) changes very quickly especially when downloading data from the Internet, the length of the time-interval (?t) should be selected neither too short
to reduce random deviation and nor too long to be as up-to-date as possible. And because only n(t) is given, n(t - ?t) has to be stored before, so when the callback is called,
the current bytes per second of the last interval can be calculated (if given) and the current byte-count can be stored for the next interval.
To avoid storing too many of these samples, which would slow down the copying operation a lot, not every n(t) should be stored.
For this reason, the property CurrentBytesSampleCount
specifies how many samples within an interval will be stored.
The following illustration shows two samples with a time distance of interval / 2, so CurrentBytesSampleCount = 2
.
The first sample was taken on SS1, the second on SS2, the third would be on SE1, the fourth on SE2, etc.
The current bytes per second between SE1 and SE2 are: (SS1.Y - SE1.Y) / ?t. Before SE1, the current bytes per second cannot be determined.
To this point, the ProgressStatistics
class uses the average bytes per second to provide reasonable values.
Progress Reporting within a GUI
Although the copying itself is already done in a separate thread, the CopyFrom
method with progress reporting is nevertheless synchronous.
So this method should be called asynchronously (see the WPF demo).
Because the ProgressStatistic
class
is not thread-safe, the progress change callback of the CopyFrom
method has to be synchronized with the GUI thread. As this is not the topic of my article, I will not go into details,
so in short: In Windows Forms, Form.BeginInvoke
can be used, in WPF Dispatcher.BeginInvoke
to synchronize the callback.
History
- Version 1.5 - Threaded Copying, Improved examples, Changing "Momentary" to "Current"
- Version 1.0 - Initial article.