|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionNowadays, multi-core processing is a growing industry trend as single core processors rapidly reach the physical limits of possible complexity and speed. It's obvious that if such wonderful hardware exists, it should be supported in .NET, shouldn't it? To check this I looked for a well-known easy-parallelizable algorithm. The benchmark should have handled different .NET parallelizing techniques with memory and mathematical operations. In my opinion, the best algorithms that fit these requirements are QuickSort and Matrix multiplication. Please take into account, that this article doesn't describe any algorithms' optimization techniques. Instead it shows, how to parallelize your C# code and compares the performance of different approaches. The Idea
Quicksort can be easily parallelized due to its divide-and-conquer nature.
If we have p processors, we can divide a list of n elements into p sublists in Θ(n) average time,
then sort each of these in What about matrices? Well, it's quite the same — easy to paralellize (lots of ways to do it) and no synchronization required.
Also, it's much easier to divide a multiplication process into two equal sub-processes (what is not done in Quicksort).
In general, an algorithms complexity is BackgroundIf you are not familiar with Quicksort or Matrix multiplication, please check the reference section. You should also know a little about multithreading. Some of the tests use Parallel FX library. It is free to download and is included in source files. Inside the code::QuicksortThe private void QSort(int left, int right)
{
if (right > left)
{
int pivotIndex = GetPivotIndex(left, right); // median-of-three approach
int pivotNewIndex = Partition(left, right, pivotIndex);
QSort(left, pivotNewIndex - 1);
QSort(pivotNewIndex + 1, right);
}
}
The first test uses managed threads to parallelize quicksort. It's advantages are:
public void SortParallelThreads(T[] array)
{
this.array = array;
int pivotIndex = GetPivotIndexParallel(array); // get pivot index after partitioning
Thread sort1 = new Thread(new ThreadStart(delegate()
{
QSort(0, pivotIndex); // sort the first part of the array
((AutoResetEvent)(waitHandles[0])).Set(); // signal, that sorting is completed
}));
Thread sort2 = new Thread(new ThreadStart(delegate()
{
QSort(pivotIndex + 1, array.Length - 1); // sort the second part of the array
((AutoResetEvent)(waitHandles[1])).Set();
}));
sort1.Start(); // start threads one by one
sort2.Start();
WaitHandle.WaitAll(waitHandles); // wait until both sorting threads finish their work
}
This test uses Microsoft Parallel Extensions to .NET Framework 3.5. The advantages of using Parallel FX are:
public void SortParallelFX(T[] array)
{
this.array = array;
int pivotIndex = GetPivotIndexParallel(array);
Parallel.Do(delegate()
{
QSort(0, pivotIndex);
},
delegate()
{
QSort(pivotIndex + 1, array.Length - 1);
});
}
The last test uses the public void SortParallelThreadPool(T[] array)
{
this.array = array;
int pivotIndex = GetPivotIndexParallel(array);
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object x)
{
QSort(0, pivotIndex);
((AutoResetEvent)(waitHandles[0])).Set();
}));
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object x)
{
QSort(pivotIndex + 1, array.Length - 1);
((AutoResetEvent)(waitHandles[1])).Set();
}));
WaitHandle.WaitAll(waitHandles);
}
Inside the code::Matrix multiplication
The
The
The
For example: linear sort calls private static void Multiply(Matrix m1, Matrix m2, Matrix resMatrix, int beg, int end)
{
for (int i = beg; i < end; i++)
for (int j = 0; j < m2.columns; j++)
{
double tValue = 0.0;
for (int k = 0; k <= m1.columns - 1; k++)
{
tValue += m1[i, k] * m2[k, j];
}
tMatrix[i, j] = tValue;
}
}
Parallel.For(0, m1.rows,delegate (int i) // Parallel.For is used
{
for (int j = 0; j < m2.columns; j++)
{
double tValue = 0.0;
for (int k = 0; k <= m1.columns - 1; k++)
{
tValue += m1[i, k] * m2[k, j];
}
tMatrix[i, j] = tValue;
}
});
The Results
Each quicksort method passed 10 tests with random integer arrays of 10^7 elements.
Quicksort terrifically depends on an array structure and on pivot elements. This explains results diversity and answers
the question "why parallel tests are not twice as fast as the linear one?"
As I expected, the "pure .NET" sort should beat all tests. Very nice results for
Each matrix multiplication method passed 5 tests with random 1000x1000 matrices. Because each parallel operation works with the same amount of data, the results do not diverse much, and the speedup is close to 50%.
ConclusionThe Task Parallel Library (TPL) is designed to make it much easier to write managed code that can automatically use multiple processors. Using the library, you can conveniently express potential parallelism in existing sequential code, where the exposed parallel tasks will be run concurrently on all available processors.
The same could be done with "traditional" approaches like Managed threads or ThreadPool. But ParallelFX library has one significant advantage:
it uses all available processor to parallelize the operation. Programmers should not worry about the end-users CPU type,
it could be single-core, multi-core, or event multiprocessor system. Although ReferenceWhat is multi-core processor?Quicksort - Wikipedia, the free encyclopedia Matrix multiplication - Wikipedia, the free encyclopedia Optimize Managed Code For Multi-Core Machines Download Microsoft Parallel Extensions to .NET Framework 3.5 History
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||