
Comments and Discussions



I like the introduction to this article.
IMO optimizations should be done late, if at all [following Michael Jacksons rules of program optimization, Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet].
But, in some cases performance is an issue, I am working on a piece of software that is very computing intense (it runs simulations), and from time to time I use Ants profiler to detect and fix bottlenecks. Often, very small changes can give a significant performance improvement. Each time I have done profiling I have found that a particular sorting in my application takes significant time, but I have (falsely) assumed that the Array.Sort method in .Net is good enough.
However, inspired by this article I compared Array.Sort with Iterative QuickSort (found here: CodeProject: Iterative QuickSort[^]) and Shell Sort, in my own code.
Performing a particular simulation, sorting is done 56.000 times. With Array.Sort, this takes 690 seconds (in the profiler), with Iterative QuickSort 130 seconds, and with Shell Sort it takes 21 seconds. In my case, I typically sort less than 100 elements that are almost sorted, so I guess that is why Shell Sort is the best sorting algorithm for me.
So although the article really does not answer the question "Why should I care about sorting algorithms", it still led me in the right direction.





Glad you liked the introduction, Fredrik, and I'm glad you found your way via the article to something useful to you. ANTS is a great tool, I've found. I've used it quite a bit in profiling web applications in the recent past.
Any sorting algorithm is only as good as the number of items being sorted. Definitely a case of using the "right tool for the job". I think someone mentioned the heap sort in one of the comments as well. You might want to give that a shot as well if the shell sort is working well for you. The heap sort can often times be much faster than a shell sort for small #'s of elements.
Good luck!
Quae narravi nullo modo negabo.





Maybe I will give it a shot. I managed to knock off a few seconds more from the Shell sort by using an optimal gap sequence, that gave a 20% performance improvement.





If you are interested in sorting algorithms with guaranteed low memory usage, I would suggest heapsort as a candidate. It sorts an array inplace in O(nlog(n)) worstcase time, which is the best achievable asymptotic time complexity for sorting arbitrary comparable objects (bucketsort is faster at O(n), but can only operate on objects that can be represented as arbitrarylength integers). By contrast, the three algorithms you mentioned each have O(n^2) worstcase behaviour. (Actually I'm not certain that that is true of Shell sort, which to the best of my knowledge has never had a tight worstcase time bound proved.) Heapsort is not recursive, and can be implemented using only enough memory to hold about three or four additional pointers (or array indices). I believe it is used in many embedded devices, where it is important to conserve memory and upper bounds on the execution time must be strictly obeyed.
WTJW





Yep, you're exactly right. I probably should have included the heap sort in my initial article, instead of the shell sort. I'll see about adding it in here shortly.
Quae narravi nullo modo negabo.





Also, is there any reason for not testing Quicksort?
And, as mentioned before, you should run the sorts sequential, not in parallel.





As stated in the article:
"In this segment of the ELMO principle I’m going to cover nonrecursive sorting algorithms".
The quicksort is recursive.
Quae narravi nullo modo negabo.





Not necessarily: there are nonrecursive implementations of Quicksort.





I'd be very interested in seeing an example. It was my understanding that the very idea of the quicksort was to split up the information to be sorted using recursion, and I've not found any article or example that states otherwise. So, if you have such an example, please be sure to post it.
Quae narravi nullo modo negabo.





Still waiting on that nonrecursive Quicksort algorithm...
Quae narravi nullo modo negabo.





Hi,
Give us a day or two  I've knocked together an example app to upload and thought I'd better post it properly: but this does mean that I have to actually write the article itself !
I'll reply here again when I've posted the article.
Pete





Hm, ok, unrolling a recursive function by using an explicit stack wasn't exactly what I thought you had in mind, but you are technically correct, if not practically. Technically you can do that with any recursive function to make it iterative. Practically, modern Quicksorts use recursion (every definition I could find of a Quicksort refers to it as "massively recursive", so it's not just me).
I have to commend the article, though. It's well written, I learned something, and I've never had anyone go to such a length to prove me wrong, so, kudos.
Quae narravi nullo modo negabo.





I found the articles and comments from both of you interesting. The debate on iteration vs. recursion has been ongoing for literally decades. In my view both sides of the issue have traditionally had sufficient rationale to justify their positions. I accept that both UsualDosage and Pete have produced their excellent pieces as works of ‘principle’; I shall make some comments on both aspects.
Comment 1: Quicksort by design is recursive. Fabricating the ‘hardware’ stack is often counterproductive as it does not use L1/L2 cache or Registers (somewhat faster than RAM). My own (casual) tests have shown that my huge effort in optimising code with a manufactured stack leads to little or negative speed benefit.
Comment 2: Manufactured stacks for obviously recursive algorithms are quite common. Examples of where I have used them are enabling reentry to an intermediate stage of a backtracking algorithm; and, enabling staged searching of a Trie.
Comment 3: Having had to write commercial software limited to 32 KiB of addressable memory I can appreciate the ELMO principle, which has enforced the use of both hardware and software stacking for different task.
Thank you both for your contributions!
Troft not lest ye be sponned on the nurg! (Milligan)





I don't want to be too confrontational about this but the whole point of my article is to illustrate that the iterative quick sort is absolutely not just an "unrolled" equivalent of the recursive solution.
There is potentially a fundamental difference between the paradigm used in the design of an algorithm and that used in its efficient implementation. UsualDosage is absolutely right to suggest that, as a process subject to the analysis of a human designer, Quicksort is most definitely inherently recursive.
However, the implementation given in my article is an evolution of that process that has no direct recursive equivalent: which is, to answer GurliGebis's query, why you should have included Quicksort in your "ELMO Principle Pt 2" article.
The net is, in fact, littered with such solutions.





Something I intended to tack onto the bottom of this:
All practical commercial Quicksort implementations are definitely not recursive: if you were to buy a library or to use a sort in the .Net library for example (assuming it is a quicksort, which is by no means a given), then it will not have been implemented using recursion. Many of the recursive implentations you see all over the net and in many text books are naive implementations of the algorithm. Any routine that is ever actually used "in anger" is heavily optimised, definitely iterative and uses, for example, "Median of Three" or "Random" partitioning to guard against worst case behaviour which seem to be rarely shown in examples.





Hm, well, perhaps "all" practical commercial implementations do use iterative sorts. After all, more lines of code usually means more billable hours.
If anyone would like to see recommended qsort implementations in everything from ALGOL 68 to Perl 6, they are freely available here...http://www.codecodex.com/wiki/index.php?title=Quicksort[^]
Anyway, this thread should probably end here to avoid a "massively recursive" discussion.
Quae narravi nullo modo negabo.





Hi,
first of all I want to say that I agree with you that every developer should have an eye on performance and resources. But I'm sorry to say that I don't think this article is a good example on how to do this.
1. Measuring performance of several algorithms should be done linear one after the other and not in paralell. Othwerise results may randomly change especially in multi core environments. It is also a good idea to run every test several times and to calculate the average of all times.
2. Your sorting algorithms are not only running paralell but even on the same instance of intArray. Every algorithm must get its own copy of it. Otherwise the test is pointless.
3. Always test against the performance of builtin features of .Net. Array.Sort(...) by far outperforms your algorithms. You should have picked an example where you could beat .Net to make your point clear.
Robert





Robert, thanks for the information and the observations. Perhaps I should clarify. The point wasn't to outperform .NET. The point was to show the speed difference between three different types of algorithms. You are correct in your observation that each use the same copy of the int array, however, the benchmarks clearly demonstrate the most efficient sorting mechanisms for the array size in the example.
I in no way intended this to be a scientific test. Rather, I was attempting to demonstrate to beginners and intermediates (hence the target of the article) the differences between sorting routines, and how they worked behind the scenes. If I had used Array.Sort, it would have been a rather short article, and that would have consequently defeated the purpose of demonstrating exactly how each algorithm worked "behind the scenes" sotospeak.
I do invite anyone who cares to read this article to show their results running serially.
Thanks,
UD
Quae narravi nullo modo negabo.





I agree with Robert, the parallel running is deceiving. If you had a break condition in your bubblesort that terminated the sort if the inner loop did not find any elements to swap, then the bubblesort would terminate just after the fastest algorithm was finished, since the array got sorted behind it's back. There are other similar risks with algorithms interfering with the timing and the result of each other. There is not even a guarantee that the list is sorted after the algorithms complete since they may swap away elements that was temporarily put there by another algorithm (although I could not provoke such behavior).
I did some testing with Array.Sort, results below.
I also figured that with the very limited choice of items (0..8) the bucket sort must be mentioned, since it performs at O(n) for limited sets of sorting keys.
I did a naive implementation that slightly outperforms Array.Sort i speed, but doubles the memory consumption (could be reduced some by changing List to int[]). Then I implemented an in place bucket sort that has (almost) no memory overhead. (Compared to an ordinary bucket sort, this version is not stable). With really large datasets this algorithm outperforms Array.Sort by a factor 10. But I must note that Array.Sort scales very well since it is so close to an algorithm specially crafted for the problem domain. It must be very special circumstances to raise the need to implement your own sorting, except for fun and/or education.
I ran some tests with some larger datasets:
I ran the tests several times, with results varying ~10%,
Sorting Benchmark
====================================
CPU speed 2 010 000 000 Hz
Preparing to sort 1 000 000 integers:
ArraySort sort started.
ArraySort sort completed in 0 min 0 sec 125 ms.
Total cycles = 251 250 000, cycles/items = 251,25
Array_int_Sort sort started.
Array_int_Sort sort completed in 0 min 0 sec 109 ms.
Total cycles = 219 843 750, cycles/items = 219,84375
BucketSort sort started.
BucketSort sort completed in 0 min 0 sec 93 ms.
Total cycles = 188 437 500, cycles/items = 188,4375
BucketSortInplace sort started.
BucketSortInplace sort completed in 0 min 0 sec 15 ms.
Total cycles = 31 406 250, cycles/items = 31,40625
ShellSort sort started.
ShellSort sort completed in 10 min 19 sec 78 ms.
Total cycles = 1 244 347 031 250, cycles/items = 1244347,03125
InsertionSort sort started.
InsertionSort sort completed in 16 min 41 sec 203 ms.
Total cycles = 2 012 418 281 250, cycles/items = 2012418,28125
BubbleSort sort started.
BubbleSort sort completed in 59 min 48 sec 218 ms.
Total cycles = 7 212 319 687 500, cycles/items = 7212319,6875
 All tests done 
Verify sort started.
Verify sort completed in 0 min 0 sec 0 ms.
Total cycles = 0, cycles/items = 0
 Everything done 
And even larger dataset (here I drop the original sorts due to infeasible execution times...)
Sorting Benchmark
====================================
CPU speed 2 010 000 000 Hz
Preparing to sort 50 000 000 integers:
ArraySort sort started.
ArraySort sort completed in 0 min 7 sec 734 ms.
Total cycles = 15 546 093 750, cycles/items = 310,921875
Array_int_Sort sort started.
Array_int_Sort sort completed in 0 min 7 sec 828 ms.
Total cycles = 15 734 531 250, cycles/items = 314,690625
BucketSort sort started.
BucketSort sort completed in 0 min 6 sec 93 ms.
Total cycles = 12 248 437 500, cycles/items = 244,96875
BucketSortInplace sort started.
BucketSortInplace sort completed in 0 min 0 sec 875 ms.
Total cycles = 1 758 750 000, cycles/items = 35,175
 All tests done 
Verify sort started.
Verify sort completed in 0 min 0 sec 250 ms.
Total cycles = 502 500 000, cycles/items = 10,05
 Everything done 
I include my code below:
Any hints of further optimizing of BucketSortInplace will be greatly appreciated.
<code>
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace Sorting_ELMO
{
class Program
{
public static int ArrLength = (int)1E6; public const int MaxSlowSort = (int)1E6;
public const int MaxBigBucketSort = (int)60E6;
public const int BiggestNum = 9;
public static Random rand = new Random();
public static double CPU_speed;
public static int[] intArray;
public static int[] refArray;
public delegate void SortFunctionDelegate();
[STAThread]
static void Main(string[] args)
{
CPU_speed = GetCPU_speed();
Console.WriteLine("Sorting Benchmark");
Console.WriteLine("====================================\n\r");
Console.WriteLine("CPU speed {0:n} Hz\n\r", CPU_speed);
if (args.Length > 0)
{
try
{
ArrLength = int.Parse(args[0]);
}
catch { }
}
intArray = new int[ArrLength];
refArray = new int[ArrLength];
for (int x = 0; x < ArrLength; x++)
{
refArray[x] = rand.Next(0, BiggestNum);
}
Console.WriteLine("Preparing to sort {0:n} integers:\n\r", intArray.Length);
const int iterations = 1;
for (int i = 0; i < iterations; i++)
{
SortRunner(ArraySort);
}
for (int i = 0; i < iterations; i++)
{
SortRunner(Array_int_Sort);
}
if (ArrLength <= MaxBigBucketSort)
{
for (int i = 0; i < iterations; i++)
{
SortRunner(BucketSort);
}
}
for (int i = 0; i < iterations; i++)
{
SortRunner(BucketSortInplace);
}
if (ArrLength <= MaxSlowSort)
{
for (int i = 0; i < iterations; i++)
{
SortRunner(ShellSort);
}
for (int i = 0; i < iterations; i++)
{
SortRunner(InsertionSort);
}
for (int i = 0; i < iterations; i++)
{
SortRunner(BubbleSort);
}
}
Console.WriteLine(" All tests done ");
for (int i = 0; i < iterations; i++)
{
intArray.CopyTo(refArray, 0);
SortRunner(Verify);
}
Console.WriteLine(" Everything done ");
Console.ReadKey();
}
static void SortRunner(SortFunctionDelegate sortFunction)
{
refArray.CopyTo(intArray, 0);
string sortName = sortFunction.Method.Name;
Console.WriteLine("{0} sort started…", sortName);
TimeSpan dtStart = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;
sortFunction();
TimeSpan dtEnd = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;
Verify();
TimeSpan ts = dtEnd.Subtract(dtStart);
Console.WriteLine("{3} sort completed in \t{0} min {1} sec {2} ms.", (int)ts.TotalMinutes, ts.Seconds, ts.Milliseconds, sortName);
Console.WriteLine("Total cycles = {0:n}, cycles/items = {1}", CPU_speed * ts.TotalSeconds, CPU_speed * ts.TotalSeconds / (double)ArrLength);
}
static void ArraySort()
{
Array.Sort(intArray);
}
static void Array_int_Sort()
{
Array.Sort<int>(intArray);
}
static void BucketSort()
{
Dictionary<int, list<int="">> buckets = new Dictionary<int, list<int="">>();
for (int i = 0; i < BiggestNum; i++)
{
buckets[i] = new List<int>();
}
foreach (int item in intArray)
{
buckets[item].Add(item);
}
int currArrPos = 0;
for (int i = 0; i < BiggestNum; i++)
{
buckets[i].CopyTo(intArray, currArrPos);
currArrPos += buckets[i].Count;
}
}
static void BucketSortInplace()
{
int[] bucketLength = new int[BiggestNum];
int[] bucketStart = new int[BiggestNum];
int[] bucketFillCount = new int[BiggestNum];
foreach (int item in intArray)
{
bucketLength[item]++;
}
for (int i = 1; i < BiggestNum; i++)
{
bucketStart[i] += bucketStart[i  1] + bucketLength[i  1];
}
const int invalidMarker = 1;
int x = intArray.Count();
int lastElement = intArray[0];
intArray[0] = invalidMarker;
for (int i = 0; i < x; i++)
{
int currElement = lastElement;
int currPos = bucketStart[currElement] + bucketFillCount[currElement];
lastElement = intArray[currPos];
intArray[currPos] = currElement;
bucketFillCount[currElement]++;
System.Diagnostics.Debug.Assert(bucketFillCount[currElement] <= bucketLength[currElement]);
if (lastElement == invalidMarker)
{
for (int idx = 0; idx < BiggestNum; idx++)
{
for (int j = bucketFillCount[idx]; j < bucketLength[idx]; j++)
{
int tmpElement = intArray[bucketStart[idx] + j];
if (tmpElement != invalidMarker)
{
lastElement = tmpElement;
intArray[bucketStart[idx] + j] = invalidMarker;
goto BreakAllFor;
}
}
}
BreakAllFor: ; #if DEBUG
if (lastElement == invalidMarker)
{
for (int idx = 0; idx < BiggestNum; idx++)
{
System.Diagnostics.Debug.Assert(bucketFillCount[idx] == bucketLength[idx]);
}
}
#endif
}
}
}
static void BubbleSort()
{
int i, j, x, iTemp;
x = intArray.Length;
for (i = (x  1); i >= 0; i)
{
for (j = 1; j <= i; j++)
{
if (intArray[j  1] > intArray[j])
{
iTemp = intArray[j  1];
intArray[j  1] = intArray[j];
intArray[j] = iTemp;
}
}
}
}
static void InsertionSort()
{
int i, j, x, index;
x = intArray.Length;
for (i = 1; i < x; i++)
{
index = intArray[i];
j = i;
while ((j > 0) && (intArray[j  1] > index))
{
intArray[j] = intArray[j  1];
j = j  1;
}
intArray[j] = index;
}
}
static void ShellSort()
{
int i, j, inc, iTemp, x;
x = intArray.Length;
inc = 3;
while (inc > 0)
{
for (i = 0; i < x; i++)
{
j = i;
iTemp = intArray[i];
while ((j >= inc) && (intArray[j  inc] > iTemp))
{
intArray[j] = intArray[j  inc];
j = j  inc;
}
intArray[j] = iTemp;
}
if (inc / 2 != 0)
{
inc = inc / 2;
}
else if (inc == 1)
{
inc = 0;
}
else
{
inc = 1;
}
}
}
static void Verify()
{
int last = 1;
int i, x;
x = intArray.Length;
for (i = 0; i < x; i++)
{
if (intArray[i] < last)
Console.WriteLine("Item at index {0} was {1}, last was {2}", i, intArray[i], last);
last = intArray[i];
}
}
static double GetCPU_speed()
{
double ghz = 0;
System.Management.SelectQuery q = new System.Management.SelectQuery("Win32_Processor");
System.Management.ManagementObjectSearcher searcher = new System.Management.ManagementObjectSearcher(q);
foreach (System.Management.ManagementObject cpu in searcher.Get())
{
ghz += ((UInt32)(cpu["MaxClockSpeed"]) / 1000.0);
return ghz * 1000000000; }
return ghz * 1000000000;
}
}
}
</int>
// Albin





Thank you for posting your findings! I think there's enough data in your post to write a full article! I definitely would like to see more people take the idea of ELMO and expand on it. I by no means am the resident expert on lowmemory operations; rather, I'm learning as I go, so I'm keenly interested in finding what else people have to contribute. I'd also be grateful if you would post your follow up on my blog (http://www.usualdosage.com) if you have time.
Thanks again!
Quae narravi nullo modo negabo.







General News Suggestion Question Bug Answer Joke Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

