Click here to Skip to main content
12,689,801 members (23,874 online)
Click here to Skip to main content
Add your own
alternative version

Stats

29.9K views
365 downloads
40 bookmarked
Posted

Efficient Map Operations for Arrays in C#

, 3 Sep 2012 CPOL
Rate this:
Please Sign up or sign in to vote.
An informal survey of implementation techniques of the map higher-order function for arrays in C#.

Introduction  

This article compares the performance of various techniques for implementing a map function for transforming arrays and introduces a novel high-performance implementation of map that uses expression tree rewriting.

A map operation constructs a sequence or list from another list where each element in the new list is the result applying a function to each corresponding element in the original list. The most well-known example of a map operation for C# programmers is the <a href="http://msdn.microsoft.com/en-us/library/bb548891.aspx">IEnumerable.Select()</a> function in the system library.   

The map operation is interesting to study for optimization because it is very common and also embarassingly parallel. A high-performance map operation is the cornerstone of an efficient collection library.  

Observations 

The primary observations of the experiment are:  

  • The simplest non-parallelized implementations far outperformed parallelized implementation when arrays were small (< 10,000 elements).  
  • Using Parallel.ForEach() with a ranged partitioner consistently outperforms  AsParallel().AsOrdered().Select().ToArray()
  • We can improve performance of Parallel.ForEach() with a ranged partitioner by passing a lambda as an expression tree and rewriting it. 

Methodology   

These tests were run on an Intel i7 quad-core machine running at 2.3 Ghz.   

In order to compare the different implementation of map, I created a test suite comparing 6 different implementation of maps. I tried each implementation with inputs of 9 different function arguments and input array lengths ranging from 100 to 10,000,000 integers.

Test functions are run on each array size repeatedly alternating between the different implementation techniques until the total time elapsed reaches 1 second.  

Techniques for Implementing a Map Operation on Arrays

In my experiments I studied the following implementations for a map operation on arrays.

Simple Sequenctial Map

Here is a simple implementation of map using a for loop.

U[] Map<T, U>(T[] xs, Func<T, U> f)  {   
  var r = new U[xs.Length];
  for (int i=0; i < xs.Length; ++i) r[i] = f(xs[i]);
  return r;
}

For small arrays (<= 1000) this was consistenly the most efficient implementation.

Implementing Map using IEnumerable.Select   

The simplest implementation of map uses the  <a style="text-decoration: underline;" href="http://msdn.microsoft.com/en-us/library/bb548891.aspx">IEnumerable.Select()</a>  operation, followed by a call to ToArray()

U[] Map<T, U>(T[] xs, Func<T, U> f)  {   
  return xs.Select(f).ToArray();
}
In my measurements this was most often the slowest approach to implementing map, especially for arrays of size 10,000 and larger. It makes sense that this is less efficient than the hand-written version, because it uses an enumerator making it hard for the compiler to leverage knowledge about the underlying data types for optimization.

Parallel Map operation using Parallel.For  

Transforming the hand-written map operation into a parallel operation can be done easily using Parallel.For() as follows:   

static U[] SimpleParallelMap<T, U>(T[] xs, Func<T, U> f) {
  var r = new U[xs.Length];
  Parallel.For(0, xs.Length, i => r[i] = f(xs[i]));
  return r;
}

Somewhat surprisingly this implementation never significantly outperformed the non-parallel version, even for large arrays, and even under-performs in cases.

Parallel Map using IEnumerable.AsParallel()   

An alternative parallel implementation of map uses the <a href="http://msdn.microsoft.com/en-us/library/dd413602.aspx">IEnumerable.AsParallel()</a>

U[] Map<T, U>(T[] xs, Func<T, U> f)  {   
  return xs.Select(f).AsParallel().AsOrdered().ToArray();
}     

The performance of this technique is comparable with the Parallel.For() approach, sometimes better, and sometimes worse. For some reason the performance was consistently and significantly worse than the Parallel.For() technique for arrays of precisely 1,000,000 items (sometimes as much 2x as long or more).

Parallel Map using Range Partitioner 

Using <code><code>Parallel.ForEeach()  a partitioner can decrease the burden on the task scheduler by creating a smaller number of tasks that operate on sub-ranges of the array.

static U[] PartitionedParallelMap<T, U>(T[] xs, Func<T, U> f)
{
  var r = new U[xs.Length];
  Parallel.ForEach(Partitioner.Create(0, xs.Length), 
    range => { for (int i = range.Item1; i < range.Item2; ++i) 
      r[i] = f(xs[i]); });
  return r;
}

The performance of this approach is the best so far for large arrays but like most parallel approaches it starts to outperform the simple hand-written sequential map implementation only once the array size reaches 10,000 elements.  

Parallel Map using Range Partitioner and Expression Tree Rewriting

The performance of the PartitionedParallelMap() function can be improved if we observe that the function passed to ForEach is dominated by the cost of invoking the lambda function argument. Generally speaking invoking a lambda function has a significant performance cost.  

If the lambda function is passed as an expression tree it can instead be inlined in a dynamically created function and passed to the Parallel.ForEach() function.    

Here is the general approach in pseudo-code:  

static U[] OptimizedPartitionedParallelMap<T, U>(T[] xs, Expresion<Func<T, U>> fexpr)
{
  var r = new U[xs.Length];
  Expression<Action<T[], U[], int, int>> fexpr2 = RewriteExpressionTree(fexpr);
  Action<T[], U[], int, int> f = fexpr.Compile();
  Parallel.ForEach(Partitioner.Create(0, xs.Length), range => 
   { f(xs, r, range.Item1, range.Item2); });
  return r;
}

The rewriting code creates a new expression tree and uses the ExpressionVisitor class to inline the body of the function argument (fexpr) within it thus eliminating the need for an expensive lambda invocation call within the for-loop.  

public class FunctionInliner <t,> : ExpressionVisitor
{
  Expression argument;
  ParameterExpression parameter;

  public FunctionInliner(Expression<func<t,>> function, Expression argument){
     parameter = function.Parameters[0];
     this.argument = argument;
  }

  public Expression Modify(Expression expression) {
    return Visit(expression);
  }

  protected override Expression VisitParameter(ParameterExpression node){ 
    return node = parameter ? argument : node;
  }
}

public static class FunctionInliner
{
  public static Expression InlineFunctionCall<t,>(Expression<func<t,>> function, Expression arg) {
     var fi = new FunctionInliner<t,>(function, arg);
     return fi.Modify(function.Body);
  }
}

This map implementation only has reasonable performance if the Expression.Compile() is memoized (i.e. cached) which is done as follows: 

private static Dictionary<Expression, object> memoizedFastMaps = new Dictionary<Expression, object>();

public static U[] FastMap<T, U>(this T[] self, Expression<Func<T, U>> fexpr) {
  Action<T[], U[], int, int> f;
  lock (memoizedFastMaps)
  {
    if (!memoizedFastMaps.ContainsKey(fexpr))
       memoizedFastMaps[fexpr] = ComputeFastMap(fexpr);
    f = (Action<T[], U[], int, int>)memoizedFastMaps[fexpr];
  }
  var r = new U[self.Length];
  Parallel.ForEach(Partitioner.Create(0, self.Length), 
  range => f(self, r, range.Item1, range.Item2));
  return r;
}  

The performance of this technique was generally a bit better than simply using the range partitioner. 

Results    

This table contains the results for the different map tests. This is also available as a color coded excel sheet in the zip package. Each value represents the total number of msec spent in each test. Each column represents a different size of the array.

100100010,000100,0001,000,00010,000,000
x * 2 
IEnumerable.Select  47.0013177.8057328.9458360.1104354.0814645.1824
SimpleMap           7.581634.740669.759194.440392.0408189.9622
Parallel.For        356.2646250.5682145.9923234.1428103.7589214.9254
ParallelQuery       154.392230.0595288.8722237.1865385.0914131.25
Partitioned ForEach 209.446154.7732107.092944.061925.684352.4506
Expression-tree map 202.83138.481255.605929.655670.317747.7849
x % 3 
IEnumerable.Select  50.9754187.2522326.9465351.2355404.2332546.7806
SimpleMap           11.095653.086696.3511120.8589136.8649205.0038
Parallel.For        335.2865221.5865138.5964213.4733121.2471159.3329
ParallelQuery       158.7274228.8488277.0174219.9876260.9124125.8866
Partitioned ForEach 217.3006152.456589.599256.383441.311257.5527
Expression-tree map 203.2819144.285968.205140.792297.922556.9034
x % 10 == 0 
IEnumerable.Select  44.749154.2371269.2269299.4427375.8133486.5356
SimpleMap           11.069854.288797.424135.1354181.7922213.2659
Parallel.For        327.3782233.9054144.3949134.1175147.8055192.6815
ParallelQuery       159.6129233.3072275.8584331.0025185.0597143.3002
Partitioned ForEach 227.4341157.5438128.437955.618153.595359.0363
Expression-tree map 207.7087153.754181.590547.057462.808761.7823
x * y 
IEnumerable.Select  47.5342176.2845330.5266417.78389.5859641.2155
SimpleMap           6.661629.93360.01282.190486.6731164.2725
Parallel.For        335.883239.2442151.766151.0844109.0079208.9425
ParallelQuery       156.0036229.0513307.9049260.6253319.6288114.221
Partitioned ForEach 219.3628156.2681.571641.701734.504554.1712
Expression-tree map 211.8914155.14565.075645.894980.943857.4718
Math.Sqrt(x)
IEnumerable.Select  52.052178.1152325.6915337.2131314.3897594.8534
SimpleMap           11.597260.8885108.1417150.2933142.9398226.6415
Parallel.For        332.5255214.7269155.6848115.8106107.1056206.2448
ParallelQuery       151.1703236.2765262.9026221.0244333.1709108.9785
Partitioned ForEach 218.4954153.519770.759133.073553.0308116.2612
Expression-tree map 212.4254146.07474.624946.622951.687284.8528
1.0 / x 
IEnumerable.Select  57.7009192.1225351.9372342.5118342.8452422.6756
SimpleMap           13.661661.0858113.1279215.0919149.2644159.6373
Parallel.For        327.4285211.7903168.9644123.7927123.0176130.4033
ParallelQuery       156.755233.4242211.9213156.8925305.5861379.4584
Partitioned ForEach 222.793151.481773.891151.074856.912286.312
Expression-tree map 200.7036139.892477.7088115.231954.25759.8788
F(x) 
IEnumerable.Select   48.3564172.7315328.2192339.3331401.4378685.65
SimpleMap            9.362253.5944119.9011134.9806153.9984271.0598
Parallel.For         317.6485199.1581138.0239188.8737127.2151202.1605
ParallelQuery        148.9416201.4768241.8003197.0903154.3222141.0791
Partitioned ForEach  215.59172.1164.825847.566845.355665.6707
Expression-tree map 239.0486190.5578103.763692.9645160.6785143.9428
xs[x % xs.Length] 
IEnumerable.Select  53.4851194.6166326.2611371.1026431.2638568.9713
SimpleMap           11.154752.341592.1421123.3329142.5564203.5745
Parallel.For        321.7491215.2261135.3974225.5697129.4563188.0756
ParallelQuery       161.9284219.4217290.0379169.8483139.5091118.6672
Partitioned ForEach 219.876157.808573.029558.384242.766959.3939
Expression-tree map 210.0777148.951280.034653.4993121.543470.5305
x.ToString() 
IEnumerable.Select  174.3339318.9216322.7798263.6981263.98242986.9454
SimpleMap           123.489255.0868285.1731247.5723297.48452781.8611
Parallel.For        198.1301109.1844115.8489147.2986174.4291952.4264
ParallelQuery       145.9428112.490696.8155125.436183.35792024.5259
Partitioned ForEach 182.239896.694989.1915129.1918226.75041928.2476
Expression-tree map 165.3074104.972589.9368121.4181178.69531835.5693

Final Words 

This survey was informal and not very rigorous. Ideally more tests would be added, and more input array types would be explored (not just arrays of ints). Finally more CPU configurations need to be tested.  

That said the results did provide some interesting insights into the fact that the size of the array has a signficiant impact on what technique is most efficient, and shows how effective range partitioning can be in parallel map operations for arrays. 

While the expression tree rewriting technique is interesting, the performance gains were less than I expected and inconsistent. For now I wouldn't recommend using it due to the complexity, but I hope others may find ways to improve its performance.  

Of course all comments are appreciated, but I'd especially like to here about results on other machines, and suggested alternative implementations. 

History        

September 2, 2012 - First Version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Christopher Diggins
Software Developer Autodesk
Canada Canada
This article was written by Christopher Diggins, a computer science nerd who currently works at Autodesk as an SDK specialist.

You may also be interested in...

Pro
Pro

Comments and Discussions

 
GeneralMy vote of 5 Pin
bubifengyun17-Oct-12 1:40
memberbubifengyun17-Oct-12 1:40 
GeneralMy vote of 5 Pin
Mohammad A Rahman15-Oct-12 5:03
memberMohammad A Rahman15-Oct-12 5:03 
GeneralMy vote of 5 Pin
DrABELL6-Oct-12 17:13
memberDrABELL6-Oct-12 17:13 
GeneralRe: My vote of 5 Pin
DrABELL6-Oct-12 18:27
memberDrABELL6-Oct-12 18:27 
QuestionRemark Pin
Juraj Lutisan11-Sep-12 20:59
memberJuraj Lutisan11-Sep-12 20:59 
QuestionMy vote of 5 Pin
gilgamash10-Sep-12 5:34
membergilgamash10-Sep-12 5:34 
GeneralMy vote of 5 Pin
John Brett2-Sep-12 23:06
memberJohn Brett2-Sep-12 23:06 
GeneralMy vote of 5 Pin
kishore doni2-Sep-12 21:16
memberkishore doni2-Sep-12 21:16 
QuestionNice! Pin
Member 4558662-Sep-12 10:30
memberMember 4558662-Sep-12 10:30 
AnswerRe: Nice! Pin
Christopher Diggins2-Sep-12 10:53
memberChristopher Diggins2-Sep-12 10:53 
Thanks for the compliment! Sorry I didn't understand what you meant by "example"? Do you mean you want some example of using map?

GeneralRe: Nice! Pin
Member 4558663-Sep-12 4:38
memberMember 4558663-Sep-12 4:38 
GeneralRe: Nice! Pin
Christopher Diggins3-Sep-12 5:09
memberChristopher Diggins3-Sep-12 5:09 
GeneralRe: Nice! Pin
Member 4558663-Sep-12 8:00
memberMember 4558663-Sep-12 8:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170113.4 | Last Updated 3 Sep 2012
Article Copyright 2012 by Christopher Diggins
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid