Click here to Skip to main content
Click here to Skip to main content

Parallel Processing

By , 21 Sep 2009
 

Introduction

As you can probably tell from my previous articles, I have a thing for optimization utilizing hardware resources, primarily CPU cores. This article is no exception, and deals with the new TPL libraries. This example uses the CTP Parallel Extensions for .NET 3.5, and is geared towards an intermediate audience.

Background

The goal of this project was to utilize the TPL to speed up calculation performance of a simple report. This example uses a LINQ group by, then performs calculations in parallel using TPL. The code attempts a fairly typical sequential calculation, and reports the time taken; it's immediately followed up by running the same calculations in parallel and displaying the results.

Since this project is geared towards intermediate developers, I will not go into detail on some aspects of this code. The reader should have a decent understanding of LINQ, Predicates, and of course, TPL.

Using the code

The project utilizes an abstract class. I didn't want to rewrite timer and calculation code twice, and perhaps thrice if I were adventurous enough to attempt a third theory.

The method that needs to be overridden is StartCalculations; the rest of the code handles the timer for the comparison of the various methods if I choose to create more.

using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ParallelClass
{
    public abstract class ReportCalculations
    {
        private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
        private long _elapsedTime;

        public long Elapsed_ms
        {
            get { return _elapsedTime; }
        }


        public void Begin(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            var sw = new Stopwatch();
            sw.Start();

            StartCalculations(companyGroups);

            sw.Stop();
            _elapsedTime = sw.ElapsedMilliseconds;
        }

        public virtual string Name
        {
            get { return "Generic Report Class"; }
        }

        public virtual void StartCalculations(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            
        }
    }
}

The rest of the code is pretty straightforward. The class accepts an int that defines the number of transaction lines to be initialized in the database. The database in this example is just a collection of CompanyInfo classes.

In the abstract class, you'll notice it is passed an interesting IEnumerable object. This is the result of the nested LINQ query devised in the code below in the GetGrouping method. The GetGrouping method groups the objects by CompanyID, then TransactionCode, so it's easier to handle multiple calculations via TPL.

The PopulateCompanyTransactions method randomly generates all the transactions that will be up for sorting and calculating.

In this example, I have two classes that are derived from our abstract class ReportCalculations.

They are MySeq and MyTPL. MySeq operates a typical sequential loop that calculates each transaction group on a single thread. The latter, MyTPL, caclulates the sums utilizing all CPUs present when possible.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;

namespace ParallelClass
{
    public class CompanyInfo
    {
        public int CompanyId { get; set; }
        public int TransactionCode { get; set; }
        public decimal Amount { get; set; }
    }

    public class Process
    {
        
       public Process(int recordsToProcess)
        {
            var rec = PopulateCompanyTransactions(recordsToProcess);
            var grouping = GetGrouping(rec);

            var calcClasses = new List<ReportCalculations> { new MySeq(), new MyTPL() };
            
            foreach(var calc in calcClasses)
            {
                calc.Begin(grouping);
                Console.WriteLine("{0} : {1}", calc.Name, calc.Elapsed_ms);
            }
            
            Console.ReadLine();
        }
       
        //Group records by Company then by Transaction
        private static IEnumerable<IGrouping<int, IGrouping<int, 
                CompanyInfo>>> GetGrouping(IEnumerable<CompanyInfo> companyInfos)
        {
            var query = from company in companyInfos
                        group company by company.CompanyId
                        into companyGroup
                            from transactionGroup in
                            (
                                from company in companyGroup
                                group company by company.TransactionCode
                            )
                            group transactionGroup by companyGroup.Key;

            return query;
        }


        //Populate record values with random data
        private static List<CompanyInfo> PopulateCompanyTransactions(int totalRecords)
        {
            var rnd = new Random();
            var companyInfo = new List<CompanyInfo>();

            for (int count = 0; count < totalRecords; count++)
                companyInfo.Add(new CompanyInfo
                            {
                                Amount = (decimal) (rnd.Next(-50, 1000)*rnd.NextDouble()),
                                CompanyId = rnd.Next(0, 100),
                                TransactionCode = rnd.Next(100, 120)
                            });
            return companyInfo;
        }
    }

    public class MySeq : ReportCalculations
    {
        private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
        public override string Name { get { return "Sequential"; } }

        public override void StartCalculations(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            foreach (var firstGroup in companyGroups)
            {
                foreach (var secondGroup in firstGroup)
                {
                    decimal total = 0;
                    foreach (var details in secondGroup)
                        total += details.Amount;

                    _totals.Add(new CompanyInfo { Amount = total, 
                       CompanyId = firstGroup.Key, TransactionCode = secondGroup.Key });
                }
            }

        }
    }

    public class MyTPL : ReportCalculations
    {
        private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();

        public override string Name { get { return "TPL"; } }

        public override void StartCalculations(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            
            foreach (var firstGroup in companyGroups)
                Parallel.ForEach(firstGroup, group => Calculate(group, firstGroup.Key));
        }

        //TPL Parallel method
        private void Calculate(IGrouping<int, CompanyInfo> grouping, int companyID)
        {
            decimal total = 0;
            Parallel.ForEach(grouping, g => { total += g.Amount; });
            _totals.Add(new CompanyInfo { Amount = total, 
               CompanyId = companyID, TransactionCode = grouping.Key });
        }   
    }
}

Conclusion

Initially I didn't expect a big difference in the calculations because my hypothesis was that by the time the underlying thread handler did an analysis, the last thread would have already completed, and this seems to be the case with limited levels of transactions.

My experience shows a huge performance boost when transaction groups contain enough lines that the thread is still alive by the time the handler checks. This boost, in this example, starts at about 1 million transactions. I was getting a 23%-25% boost in performance, consistently, with 10 million transactions, but a much slower result with anything below 1 million.

The impact this project shows should be obvious in that although parallel models are theoretically more efficient, there are cases that need more investigation and benchmarking prior to implementation.

This example should not be a definitive benchmarking guide for your own work, since each case of parallel design is going to be unique.

For fun, try creating a new calculation class, and outperform both the MySeq and MyTPL classes at any number of transactions.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

James Cann
Technical Lead Ernst & Young LLP (Canada)
Canada Canada
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralLike it.memberAbhishek Sur21 Sep '09 - 21:55 
5 from me. Good point Thumbs Up | :thumbsup: Wink | ;)
 
Abhishek Sur

My Latest Articles
Create CLR objects in SQL Server 2005
C# Uncommon Keywords
Read/Write Excel using OleDB

Don't forget to click "Good Answer" if you like to.

GeneralGreat pointmembermaxxnostra21 Sep '09 - 8:36 
This article touches on excelent topic that is rerarly discussed amongs developers - however it may have strong impact on the performace if not done at all.
 
I had most of my headaches to configure degree of paralelism for SQL Server which basically uses same hypotesis as you - however the results are more then inconsistent for different types of CPU. For examples the hypertreaded P4 Xeon with 256 internal cahche performed much worst with hypertreading enabled (based on number of test transactions - multi million records import and some few hundred updates at the same time while large multimillion record report is generating) - meanign that the CPU was strong enough to do the job on single tread faster then on more then one tread with the amout of tested data. However AMD Turion Qaud CPU with bigger 512 cache, was slower when 1 CPU was utilized then when all 4 cores were enganged from SQL server side (mayhave to do with CPU architecture too - could note do low level analysis to tell why exactly this was the case).
 
Not to complicated any futher -- it really depeneds on the the hypertreading, multi-core design of your CPU(s). If everything was black and white (one core per cpu - multi CPU design) it would be simple to test and know how to achive better results. Developers as my self may run into results that perform poor on dual core CPU but perform much nicer on slower hypertreaded CPU. It requires lots of trial and error with the particular data and CPU to fine tune.
GeneralRe: Great pointmemberJames Cann21 Sep '09 - 9:25 
Hi Max, I appreciate your insight into this subject which could almost be considered a paradigm shift in how we, as developers, need to assess code implementation. Your comment on the Xeon is a great example and one a lot of us need to understand both at a hardware and software level, simply meaning "What do we have?" and "What are we doing?".
 
This is something that will advance into a beast on it's own. I do believe the folks at MS are keen on the implications and was probably the seed for TPL. My hope is that someday the runtimes will be smart enough to tackle this for the developer but in the meantime while all of us are "fine tuning" we need to share various ways of harnessing the optimal performance of our tools.
 
Thanks for sharing!
GeneralRe: Great pointmembermaxxnostra24 Sep '09 - 9:29 
Microsoft gave up too many times on things they do not have control over, could not have control or was to complex to achive. The point is there -- it is really very imortant to undersand what we have from hardware perspective - understand the OS (also very important to understand if it is native 64bit perfomance 32bit WoW or simply 32 bit - add the flavor of CPU you have on this and imagine different results) and then on top of this understand how to optmise the code from developers/coding perspective. I agree that MS should put more efforts towards this from developers perspective and ensure that we as developers have an option to do something about the optmisations if we need to.
 
Once again -- great article and topic. Thanks for the time you invested in both writing the article and the reply.
 
Cheers.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 21 Sep 2009
Article Copyright 2009 by James Cann
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid