Click here to Skip to main content
Click here to Skip to main content
Go to top

Copy a Stream with Progress Reporting

, 10 Apr 2012
Rate this:
Please Sign up or sign in to vote.
This article describes how to copy a stream into another and how to track comprehensive progress information of the copy operation.

Introduction 

Copying streams is an often used task, unfortunately the .NET Framework itself does not come with an easy to use class or method for directly copying data from one stream to another and even less with comprehensive progress reporting. But especially when the process of copying takes a while, for example when down or uploading data, detailed progress feedback like average copy speed, current copy speed, or estimated duration is required. 

This article describes how to solve these issues and introduces reusable StreamHelper and ProgressStatistics classes.

Copy Data from one Stream to another Stream

The first part of this article is about copying binary data without progress tracking. 

To avoid loading all data from the source stream into the RAM, a smaller buffer (typically 4096 bytes long) is needed, which shifts the data only chunk by chunk to the target stream. If the whole data would be copied into the RAM at once, an OutOfMemory exception could be thrown.
In a loop, this buffer is filled with data from the source stream and then written to the target stream until all data is copied. An activity diagram of this procedure is shown on the right.

This task can be implemented with an extension method, so you can call targetStream.CopyFrom(sourceStream). Here is the listing for this extension method called CopyFrom:

/// <summary>
/// Copies the source stream into the current
/// </summary>
/// <param name="stream">The current stream</param>
/// <param name="source">The source stream</param>
/// <param name="bufferSize">Optional the size of buffer used for copying bytes</param>
/// <returns>The number of bytes actually copied.</returns>
public static long CopyFrom(this Stream stream, Stream source, int bufferSize = 4096)
{
    int count = 0;
    byte[] buffer = new byte[bufferSize];
    long length = 0;

    while ((count = source.Read(buffer, 0, bufferSize)) != 0)
    {
        length += count;
        stream.Write(buffer, 0, count);
    }

    return length;
}

This is the basic technique for copying streams, which will be used in the next chapter.

Copy Data with Progress Reporting

The Progress Change Callback

To report progress, a simple delegate can be used as a callback: 

/// <summary>
/// A delegate for reporting binary progress
/// </summary>
/// <param name="bytesRead">The amount of bytes allready read</param>
/// <param name="totalBytesToRead">The amount of total bytes to read. Can be -1 if unknown.</param>
public delegate void ProgressChange(long bytesRead, long totalBytesToRead);

A callback means, that a pointer to a method is passed as argument, so this method can be called inside the actually called method.

Some streams are not seekable and their lengths are unknown until they are read to end, e.g., the NetworkStream. For this purpose, the parameter totalBytesToRead can be -1 to indicate that the length of the source stream cannot be determined.

The Progress Reporting

To report progress, the progress change callback has to be called in regularly intervals. Since the progress report is for humans, who want a continuously feedback, the interval should be time dependent, i.e. a timespan. It is not sense full to use an amount of copied bytes as interval, hence uploading data is a lot slower than copying a local file into RAM - so in the first case the progress would be reported very rarely whereas in the second case very often (per timespan).  

Because in the moment of reporting progress changes, no data can be copied, the copying should be done asynchronously to the progress reporting. 

The following activity diagram visualizes this idea:

 

It is important, that the progress change callback is called in the context of the calling thread as this is the expected behavior when using this method. Indeed, both threads are synchronized (in fact, there are no shared resources except one), but some libraries does not want to be accessed by different threads (e.g. WPF). 

The only shared resource between both threads is the number of bytes already copied, which is needed for progress calculations. Since this variable is a 64 bit one (as there are files larger than 232 Bytes = 4 Gibibytes) and only read/write accesses to 32 bit variables are atomic in C#, the static Interlocked class has to be used for synchronizing this resource between both threads.

To abort the copying operation, the progress reporting thread checks a passed WaitHandle and notifies the copying thread (if the WaitHandle is set) by setting a flag. As accesses to booleans in C# are atomic, this variable has not to be synchronized. 

If the length of the stream is known through additional meta information, but the stream itself is not seekable, the total length can be passed to the CopyFrom method to allow proper progress-calculations.

Because the current implementation of CopyFrom with progress reporting accepts a lot of optionally arguments (total length, buffer size, progress change callback, stop event (wait handle) and progress change callback interval), they have been moved into their own CopyFromArguments class.

So the method consists of the following:

public static long CopyFrom(this Stream target, Stream source, CopyFromArguments arguments)
{
    if (target == null)
        throw new ArgumentNullException("target");
    if (source == null)
        throw new ArgumentNullException("source");
    if (arguments == null)
        throw new ArgumentNullException("arguments");
    if (arguments.BufferSize < 128)
        throw new ArgumentOutOfRangeException("arguments.BufferSize",
            arguments.BufferSize, "BufferSize has to be greater or equal than 128.");
    if (arguments.ProgressChangeCallbackInterval.TotalSeconds < 0)
        throw new ArgumentOutOfRangeException("arguments.ProgressChangeCallbackInterval",
            arguments.ProgressChangeCallbackInterval,
            "ProgressChangeCallbackInterval has to be greater or equal than 0.");

    long length = 0;

    bool runningFlag = true;

    Action<Stream, Stream, int> copyMemory = (Stream _target, Stream _source, int bufferSize) =>
        //Raw copy-operation, "length" and "runningFlag" are enclosed as closure
        {
            int count;
            byte[] buffer = new byte[bufferSize];

            while ((count = _source.Read(buffer, 0, bufferSize)) != 0 && runningFlag)
            {
                _target.Write(buffer, 0, count);
                long newLength = length + count;
                //"length" can be read as this is the only thread which writes to "length"
                Interlocked.Exchange(ref length, newLength);
            }
        };

    IAsyncResult asyncResult = copyMemory.BeginInvoke(target, source, arguments.BufferSize, null, null);

    long totalLength = arguments.TotalLength;
    if (totalLength == -1 && source.CanSeek)
        totalLength = (long)source.Length;

    DateTime lastCallback = DateTime.Now;
    long lastLength = 0;

    while (!asyncResult.IsCompleted)
    {
        if (arguments.StopEvent != null && arguments.StopEvent.WaitOne(0))
            runningFlag = false; //to indicate that the copy-operation has to abort

        Thread.Sleep((int)(arguments.ProgressChangeCallbackInterval.TotalMilliseconds / 10));

        if (arguments.ProgressChangeCallback != null
            && DateTime.Now - lastCallback > arguments.ProgressChangeCallbackInterval)
        {
            long currentLength = Interlocked.Read(ref length); //Since length is 64 bit, reading is not an atomic operation.

            if (currentLength != lastLength)
            {
                lastLength = currentLength;
                lastCallback = DateTime.Now;
                arguments.ProgressChangeCallback(currentLength, totalLength);
            }
        }
    }

    if (arguments.ProgressChangeCallback != null && lastLength != length)
        //to ensure that the callback is called once with maximum progress
        arguments.ProgressChangeCallback(length, totalLength);

    copyMemory.EndInvoke(asyncResult);

    return length;
}

Note that in this implementation the .Net thread pool has been used for threading (copyMemory.BeginInvoke). 

Why not using a Class for that? 

While writing this article, I wondered whether I did not followed the object oriented design but used the functional approach for the CopyFrom method. In my opinion, an OOD solution would only be more complex, both to write and to use. As the copying of streams is more an action than an object and the action has not to be extended in the way of OOD, I think the functional approach is more natural.

Class Diagram

Progress statistics

Now the progress can be tracked, but displaying only the number of bytes copied in contrast to the number of bytes which will be copied, is not enough for most users. For the interesting details like current bytes per second or estimated duration, more calculations are needed. The good thing is that nothing more than an analysis of the amount of copied bytes in relation to the amount of bytes which will be copied over the time is needed. So the previous CopyFrom method with progress reporting can be used for that, as the callback ships all information needed for further calculations.

The ProgressStatistic class

Because only the callback of the CopyFrom method has to attached to the class which does the progress calculation and the class even does not have to know the delegate, both classes (the StreamHelper and ProgressStatistic) are independent from each other and highly reusable. So the only interface the ProgressStatistic class has to provide, is a method with a signature matching the progress change delegate.

A class diagram of the statistic class with all its properties is shown on the right. 

Since the most part of the statistic class is boilerplate (StartingTime, FinishingTime, Progress, AverageBytesPerSecond, etc.), I will deepen only the more difficult part: The calculation of the current bytes per second. 

Current bytes per second

In contrast to the (global) average bytes per second, current bytes per second tries to approximate the local average (i.e., the average in a very small interval) of bytes, which would be copied in a second. In principle it is the derivation of the function n(t) = the count of copied bytes over the time. The derivation can be expressed with:

n'(t) = limes ?t to 0 ( (n(t) - n(t - ?t)) / (?t) ).

But since n(t) is neither differentiable nor continuous, this mathematical equation can only be approximated through simple gradient triangles:

n'(t) = (n(t) - n(t - ?t)) / (?t)

As n'(t) changes very quickly especially when downloading data from the Internet, the length of the time-interval (?t) should be selected neither too short to reduce random deviation and nor too long to be as up-to-date as possible. And because only n(t) is given, n(t - ?t) has to be stored before, so when the callback is called, the current bytes per second of the last interval can be calculated (if given) and the current byte-count can be stored for the next interval. To avoid storing too many of these samples, which would slow down the copying operation a lot, not every n(t) should be stored. For this reason, the property CurrentBytesSampleCount specifies how many samples within an interval will be stored. The following illustration shows two samples with a time distance of interval / 2, so CurrentBytesSampleCount = 2

not available

The first sample was taken on SS1, the second on SS2, the third would be on SE1, the fourth on SE2, etc. The current bytes per second between SE1 and SE2 are: (SS1.Y - SE1.Y) / ?t. Before SE1, the current bytes per second cannot be determined. To this point, the ProgressStatistics class uses the average bytes per second to provide reasonable values.

Progress Reporting within a GUI

Although the copying itself is already done in a separate thread, the CopyFrom method with progress reporting is nevertheless synchronous.

So this method should be called asynchronously (see the WPF demo). 

Because the ProgressStatistic class is not thread-safe, the progress change callback of the CopyFrom method has to be synchronized with the GUI thread. As this is not the topic of my article, I will not go into details, so in short: In Windows Forms, Form.BeginInvoke can be used, in WPF Dispatcher.BeginInvoke to synchronize the callback.

History 

  • Version 1.5 - Threaded Copying, Improved examples, Changing "Momentary" to "Current"
  • Version 1.0 - Initial article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Henning Dieterichs
Student
Germany Germany
Presently I am a student of computer science at the Karlsruhe Institute of Technology in Germany.

Comments and Discussions

 
GeneralMy vote of 5 Pinmemberphil.o23-Aug-13 0:10 
GeneralMy vote of 5 Pinmembertoantvo26-May-13 23:45 
QuestionMy five. PinmemberPHS2414-Apr-12 22:50 
AnswerRe: My five. PinmemberHenning Dieterichs4-Apr-12 22:56 
GeneralRe: My five. [modified] PinmemberPHS2415-Apr-12 0:11 
GeneralRe: My five. PinmemberHenning Dieterichs5-Apr-12 1:01 
GeneralRe: My five. PinmemberPHS2415-Apr-12 5:01 
GeneralMy vote of 5 PinmemberpeteSJ28-Mar-12 19:08 
QuestionYou have a good future PinmemberDewey28-Mar-12 9:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140922.1 | Last Updated 10 Apr 2012
Article Copyright 2012 by Henning Dieterichs
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid