Click here to Skip to main content
Click here to Skip to main content

Parallel fast compression unleashing the power of multi-core machines using the .NET TPL (Task Parallel Library)

, 8 Jan 2010 CPOL
Rate this:
Please Sign up or sign in to vote.
Parallel fast compression uses TPL to achieve complete usage of a multi-core system. Fast compression compresses files nX times faster, where n = the number of processors in the machine.

Introduction

Today, most applications, libraries, and servers are designed for single-core CPU machines. These software systems will not automatically gain performance improvement by simply running it in multi-core machines. Consider a web server that runs in a single-core machine serving a thousand clients. This web server simply performs compression operation on a given file and stores it. By switching to run this web server in a multi-core machine will not simply increase the performance/throughput. The only way to achieve performance improvement is to write software systems targeting multi-cores. Unfortunately, it is not easy to write software systems that can take advantage of multi-core processors. The Microsoft .NET Framework Task Parallel Library (TPL) is designed to ease this job. This article is about a FastCompress library that uses TPL to achieve impressive performance improvements in multi-core machines. How much impressive? The answer to this depends on the number of processors in your system. If you have 4 processors, you would see approximately 4X (4 times) improvement, 8X for 8 processors etc.

For those who are new to TPL, here's a brief overview from MSDN (Reference 1):

"TPL is being created as a collaborative effort by Microsoft® Research, the Microsoft Common Language Runtime (CLR) team, and the Parallel Computing Platform team. TPL is a major component of the Parallel FX library, the next generation of concurrency support for the Microsoft .NET Framework. TPL does not require any language extensions and works with the .NET Framework 3.5 and higher.

The Task Parallel Library (TPL) is designed to make it much easier to write managed code that can automatically use multiple processors. Using the library, you can conveniently express potential parallelism in existing sequential code, where the exposed parallel tasks will be run concurrently on all available processors. Usually this results in significant speedups".

The Fast Compress Application below was tested in a multi-core machine (4 processors). Both cases - using TPL and not using TPL - are captured below. You can observe that when using TPL, the speed is nearly 4 times faster than when not using TPL.

Why multi-core apps, libraries, and servers are useful

An easy way to answer this question is: Google 'The Free Lunch Is Over'. The first result will show you why a fundamental turn towards concurrency in software systems are required (Reference 2). The major processor manufacturers are unable to increase the CPU performance (speed) any further. So the hardware manufacturers are moving towards multi-core hardware architectures. What does this mean? Consider this is 1998 and I am selling App_Sweet, an application used by millions of users. Intel releases a new CPU twice as fast. All I have to do is, just run my App_Sweet in a new computer (with the 2X speed processor). I can see my application performance increase (i.e., a free lunch). But now, the situation has changed. If I run my App_Sweet in a new multi-core computer, I may not observe much of a performance increase. Because, I need to design/architect/code my application for a multi-core processor (i.e., the free lunch is over). Now, if I want to satisfy my users (by increasing my app's performance), I need to change my application design to unleash the power of the underlying multi-core architecture. Interestingly, the performance gain can be close to n times, where n is the number of processors. This increase in additional computing power can translate into a software system's performance, additional features, productivity improvement, efficiency in usage of available resources etc.

In short, if you have a business to run and hire 10 employees, will you delegate the work to all the 10 employees or just to one? If your answer is that you will delegate the work to all the 10 employees, then you will like to use TPL.

Using the code

Prerequisites

Step 1: Just to run: you will need to install .NET framework 4 Beta 2 from here: http://www.microsoft.com/downloads/details.aspx?familyid=DED875C8-FE5E-4CC9-B973-2171B61FE982&displaylang=en.

Step 2: To compile and run: you will need to install Visual Studio 2010 Beta 2 from here: http://msdn.microsoft.com/en-us/vstudio/dd582936.aspx.

Step 3: To see the demonstration of performance difference, in addition to the above steps (steps 1 and 2), you will need a multi-core machine.

To see a quick demo

  1. Download the QuickDemo.zip file.
  2. Uncompress the files into a directory (folder).
  3. Run the executable TestFastCompress.exe.
  4. Select a file (select a bigger file of size more than a few MB) to compress, by clicking the Browse button.
  5. Select the 'Do not use TPL' option. Then, click the Compress button. Note down the performance (in milliseconds) at the Action Result section.
  6. Select the 'Use Multi Core for performance' option. Then, click the Compress button. Note down the performance (in milliseconds) at the Action Result section.

Based on the number of processors in your machine, you should see the speed increase. Typically, the speed increase is close to n times, where n is the number of processors in your machine.

Troubleshooting tip: In case the test application (quick demo) downloaded from this article did not run or throws an exception, you may be having .NET Framework 4 Beta 1. Please install .NET Framework 4 Beta 2 from this link: http://www.microsoft.com/downloads/details.aspx?familyid=DED875C8-FE5E-4CC9-B973-2171B61FE982&displaylang=en.

To use the FastCompress library in your application: Using the FastCompress library in your application is easy. All you need to do is add the library (assembly) in your project as a reference. Then, start using the library.

How to do this:

  1. In Visual Studio 2010, select Solution Explorer.
  2. Right click References to select the 'Add Reference..." menu option. Browse to choose FastCompress.dll. Then, click OK to add this reference to your project.
  3. Use the sample code below:
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using BK.Util;
     
    namespace TestFastCompress
    {
    class Program
    {
        static void Main(string[] args)
        {
           FastCompress.doNotUseTPL = true;
           FastCompress.compressStrictSeqential = false;
           System.Console.WriteLine("Time taken for Seq compression = {0}", 
             FastCompress.CompressFast(@"Z:\File1.test.seq", 
             @"Z:\File1.test", true));
           System.Console.WriteLine("Time taken for Seq Un compression = {0}", 
             FastCompress.UncompressFast(@"Z:\File1.test.orgSeq", 
             @"Z:\File1.test.seq", true));
         
           FastCompress.doNotUseTPL = false;
           FastCompress.compressStrictSeqential = false;
           System.Console.WriteLine("Time taken for Parallel " + 
             "compression = {0}", 
             FastCompress.CompressFast(@"Z:\File1.test.pll", 
             @"Z:\File1.test", true));
           System.Console.WriteLine("Time taken for Parallel Un compression = {0}", 
             FastCompress.UncompressFast(@"Z:\File1.test.orgpll", 
             @"Z:\File1.test.pll", true));
        }
    }
    }

Performance results

The basic configuration of the machine used for this performance test is:

BasicSysCfg.jpg

Shown below are the performance results of the FastCompress library. The X axis is the size of files compressed in Mega Bytes (MB), and Y axis is the time taken for the compression in milliseconds (ms). You can observe that the speed of the compression using TPL grows close to 4 times faster. The test was performed in a 4 processor machine.

Perf.jpg

Explanation of the FastCompress library (uses TPL)

Multi-processor machines are getting standard as the clock speed increase of single processors have almost ended. To improve performance, software components are required to be designed/architected to leverage the multi-core architecture. For example, consider that I have a compression utility application. Running the compression utility in a 4 processor machine and a processor machine will not differ much as is. To increase the performance of the compression utility in a 4 processor machine, the compression utility needs to be re-designed. The re-design should consider splitting the work between multiple processors. This may seem an easy task to do. Efficient dynamic work distribution to all the available processors is a tricky task. Parallelizing everything may, in some cases, decrease the performance. Also, if not performed anticipating future business dynamics, extending the parallel design may create complicated problems and result in random crashes. Random crashes are very tough to debug, especially in a parallel programming environment.

The FastCompress library is based on a simple design to parallelize the compression. As in the figure below, the FastCompress slices the file (to be compressed) to create a batch for the processor(s) to operate (compress) independently. These slices are tasked to each processor. This is achieved using the .NET Framework 4 Task Parallel Library (TPL). Note that by using Task.Factory.StartNew, we are expressing interest to the TPL that we would like to run the tasks in parallel. The TPL does the work of scaling automatically to the number of processors in the host computer. If the host machine has 8 processors, most likely, the first 8 slices will be taken for parallel execution on the 8 processors. This obviously improves the performance by approximately 8 times.

LibDesign.jpg

The code below is used for performing parallel compression:

while(0 != (read = sourceStream.Read(bufferRead, 0, sliceBytes)))
{
   tasks[taskCounter] = Task.Factory.StartNew(() => 
     CompressStreamP(bufferRead, read, taskCounter, 
     ref listOfMemStream, eventSignal)); // Line 1
   eventSignal.WaitOne(-1);           // Line 2
   taskCounter++;                     // Line 3
   bufferRead = new byte[sliceBytes]; // Line 4
}

Task.WaitAll(tasks);                  // Line 6

In the above code, the while loop in the first line reads sliceBytes (1 MB) at a time. Line 2 calls the TPL method to trigger a parallel activity, i.e., calls the method CompressStreamP asynchronously. Before the method CompressStreamP completes, the code in the next line will be executed. This is a little over-parallellism here. We need to moderate this here. For example, let us assume that we do not have Line 2. In the first iteration, the taskCounter value will be 0. But before CompressStreamP is called, taskCounter may get incremented to 1 (Line 3). So, the first call to CompressStreamP may end up with the value of taskCounter = 1. To explain in detail: the CompressStramP's taskCounter parameter is passed by value. But before it gets copied to the stack (pushed to stack for method call), the value may end up changing due to the Line 3 increment operator. To avoid this, we have Line 2, that event waits for the stack copy (stack push and pop to tell precisely) to complete. Line 3 increments to pass the next 1 MB slice in the file. Line 4 creates a new buffer to copy the next 1 MB slice. After tasking all the slices for the available processors in the system, Line 6 waits for all the tasks (compression tasks) to complete.

The next step after completion of all the tasks is to merge them into a single compressed file. As many of you may assume, a simple merge will not work. This is because the compressed slices may be of different sizes, i.e., each 1 MB slice will not compress to 10 KB. These sizes can vary. During decompression, we need to uncompress exactly the same slice that is compressed. Then, a merge of all these slices will give us the original file. As shown in the figure below, we merge the compressed slices as file size, compressed stream, file size, compressed stream... This also helps us to uncompress the files later in a parallel fashion.

LibMergeDesign.jpg

The code that performs this task is as below:

for (taskCounter = 0; taskCounter < tasks.Length; taskCounter++)
{
   byte[] lengthToStore = 
     GetBytesToStore((int)listOfMemStream[taskCounter].Length);
 
   targetStream.Write(lengthToStore, 0, lengthToStore.Length);
   byte[] compressedBytes = listOfMemStream[taskCounter].ToArray();
   listOfMemStream[taskCounter].Close();
  listOfMemStream[taskCounter] = null;
  targetStream.Write(compressedBytes, 0, compressedBytes.Length);
}

After waiting for all the tasks to complete, the compressed streams are merged. In a for loop, each stream is converted into a byte array and written to the target file stream. The memory stream is then closed and disposed.

Points of interest

  1. Fast compression is achieved by dividing the task to all the available processors in a machine. This improves the speed of compression significantly. Also, it utilizes the available machine resources efficiently for the benefit of the users.
  2. The AutoResetEvent, after starting a new task, is very much necessary for ensuring the proper copy of parameters in to the function's stack. This is required for the push and the pop operations in a method call to have the right values.
  3. The design is kept simple to avoid any usage of synchronization (except the signal mentioned in point 3) to improve performance. An ArrayList is passed to the parallel compress method, suppressing the temptation to use a parallel data structure.
  4. FastCompress will be faster compared to GZipStream's (.NET Framework 4) compression and decompression in a multi-core machine.
  5. The performance of Fast Compress is directly proportional to the number of processors in the host machine
  6. The performance metrics (as described in the Performance section above) shows that the performance increase achieved is significant using the TPL.
  7. FastCompress's decompression is also approximately n times faster. This can also be verified by running the FastCompression sample application provided.
  8. The FastCompress library is multi-thread safe.

References:

  1. Optimize Managed Code For Multi-Core Machines -- http://msdn.microsoft.com/en-us/magazine/cc163340.aspx
  2. The Free Lunch is Over: A Fundamental Turn Towards Concurrency in Software - http://www.gotw.ca/publications/concurrency-ddj.htm
  3. GZipStream MSDN reference - http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
  4. Parallel programming in the .NET Framework -- http://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspx

History

  • Version 1 - Dec. 28th 2009.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Bharath K A
Software Developer (Senior)
United States United States
No Biography provided

Comments and Discussions

 
GeneralMy vote of 5 PinmemberEdo Tzumer12-Nov-12 0:15 
GeneralRe: My vote of 5 PinmemberBharath K A10-Dec-12 10:22 
QuestionMy Vote is 5 Pinmembershadmehr11-Nov-12 21:49 
AnswerRe: My Vote is 5 PinmemberBharath K A10-Dec-12 10:22 
GeneralMy vote of 5 PinmemberSachin_coder17-Nov-11 19:07 
GeneralRe: My vote of 5 PinmemberBharath K A15-Aug-12 17:07 
GeneralMy vote of 5 PinmemberSachin_coder17-Nov-11 19:05 
GeneralRe: My vote of 5 PinmemberBharath K A15-Aug-12 17:07 
GeneralGood Article Pinmemberblast1265-Jan-10 23:59 
GeneralRe: Good Article PinmemberBharath K A6-Jan-10 4:15 
Thanks buddy
GeneralVery good article, subject and well explained PinmemberMember 19785455-Jan-10 10:24 
GeneralRe: Very good article, subject and well explained PinmemberBharath K A5-Jan-10 10:50 
GeneralWell explained! PinmemberEber Ramirez5-Jan-10 7:37 
GeneralRe: Well explained! PinmemberBharath K A5-Jan-10 8:53 
GeneralMy vote of 2 Pinmemberjpmik29-Dec-09 1:05 
GeneralRe: My vote of 2 PinmemberBharath K A29-Dec-09 6:35 
GeneralRe: My vote of 2 Pinmemberjpmik29-Dec-09 7:25 
GeneralRe: My vote of 2 PinmemberBharath K A29-Dec-09 8:02 
GeneralRe: My vote of 2 PinmemberAndre Luiz V Sanches8-Jan-10 9:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.141220.1 | Last Updated 8 Jan 2010
Article Copyright 2009 by Bharath K A
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid