Click here to Skip to main content
15,892,199 members
Articles / Programming Languages / C#
Article

The ELMO Principle - Part 2 - Sorting Algorithms

Rate me:
Please Sign up or sign in to vote.
3.44/5 (3 votes)
20 Dec 2007CPOL3 min read 42.6K   12   20
How to use the best algorithm for the job.

Introduction

In this segment of the ELMO principle, I’m going to cover non-recursive sorting algorithms. Some people might say, “The .NET Framework already provides us with array sorting mechanisms and the SortedList generic! Why should I care about sorting algorithms?” Simply put, that is the scope of the ELMO (Extremely Low Memory Optimization) principle! If you can build your own sorting algorithms for your objects, you can take advantage of extremely fast sorting without any additional overhead, boxing or un-boxing. But, in order to do that, you have to know what the most efficient and least efficient methods are for sorting. With sorting algorithms, there is no “one-size-fits-all” approach. Sorting algorithms are more or less effective depending on “what” and “how many” they are sorting.

Background

In this article, I’m going to cover three non-recursive sorting algorithms. Simply put, these are single functions that sort without having to call other functions recursively. The sorting algorithms I’m going to address are Bubble, Shell, and Insertion. The example code I’m going to post illustrates benchmarks of each sorting algorithm, given an integer array of 100,000 pseudorandom integers between 0 and 9 to sort in ascending order. The console application is multi-threaded, so I’m going to post the Main() function first, then detail each static method called by the threads. This is a pretty solid CPU exercise for your PC, so if you build something like this to benchmark, be sure you have a robust PC, and that you have closed all other applications before running this.

Building a Benchmark

C#
public static Random rand = new Random();

public static int[] intArray;

[STAThread]
static void Main(string[] args)
{
  Console.WriteLine("Sorting Benchmark");
  Console.WriteLine("====================================\n\r");

  // Fill array with random ints between 0 and 9
  intArray = new int[100000];

  for (int x = 0; x < 100000; x++)
  {
    intArray[x] = rand.Next(0, 9);
  }

  Console.WriteLine("Preparing to sort {0} integers:\n\r",intArray.Length);

  // Create separate threads for each sort
  ThreadStart tsBubble = new ThreadStart(BubbleSort);
  Thread tBubble = new Thread(tsBubble);

  ThreadStart tsShell = new ThreadStart(ShellSort);
  Thread tShell = new Thread(tsShell);

  ThreadStart tsInsertion = new ThreadStart(InsertionSort);
  Thread tInsertion = new Thread(tsInsertion);

  // Start the race
  tBubble.Start();
  tShell.Start();
  tInsertion.Start();

  Console.ReadLine();

}

First Sort: The Bubble Sort

The Bubble sort is the least efficient of the three sorting algorithms listed here. The Bubble sort proceeds through the array from top to bottom, comparing an item to the one next to it. If it’s larger, it will swap it, and start over. In this way, smaller values “bubble” to the top of the array. A full pass of the array isn’t completed until the Bubble sort gets to the very end of the array without swapping a value. The code for the algorithm looks like the following:

C#
static void BubbleSort()
{
  Console.WriteLine("Bubble sort started…");
  // Stamp start time

  DateTime dtStart = DateTime.Now;

  int i, j, x, iTemp;

  x = intArray.Length;

  for (i = (x - 1); i >= 0; i–)
  {
    for (j = 1; j <= i; j++)
    {
      if (intArray[j - 1] > intArray[j])
      {
        iTemp = intArray[j - 1];

        intArray[j - 1] = intArray[j];

        intArray[j] = iTemp;
      }
    }
  }

  // Stamp end time
  DateTime dtEnd = DateTime.Now;

  TimeSpan ts = dtEnd.Subtract(dtStart);

  Console.WriteLine("Bubble sort completed in {0} sec {1} ms.",
                    ts.Seconds, ts.Milliseconds);
}

Second Sort: The Insertion Sort

The Insertion sort is generally at least twice as efficient as the Bubble sort, and usually a better sorting algorithm to use when sorting larger amount of data than is recommended by the Shell sort. This algorithm works by inserting each item into its final place in the array. The simplest implementation (as detailed below) requires two array structures; the source and the sorted array. For large amounts of data, the Insertion sort obliterated the Bubble sort in benchmark tests.

C#
static void InsertionSort()
{
  Console.WriteLine("Insertion sort started…");
  // Stamp start time
  DateTime dtStart = DateTime.Now;
  int i, j, x, index;
  x = intArray.Length;
  for (i = 1; i < x; i++)
  {
    index = intArray[i];
    j = i;

    while ((j > 0) && (intArray[j - 1] > index))
    {
      intArray[j] = intArray[j - 1];
      j = j - 1;
    }
    intArray[j] = index;
  }

  // Stamp end time
  DateTime dtEnd = DateTime.Now;
  TimeSpan ts = dtEnd.Subtract(dtStart);
  Console.WriteLine("Insertion sort completed in {0} sec {1} ms.",
                    ts.Seconds, ts.Milliseconds);
}

Third Sort: The Shell Sort

Also known as a Comb sort, the Shell sort uses diminishing increments. It performs multiple passes through the array, and each time sorts equally sized sets using the Insertion sort. As the size of the set increases, the number of sets to be sorted decreases. The sets in a Shell sort are not contiguous. For example, if there are three sets, then a set is composed of every third element in the initial array. The Shell sort is supposed to be at least five times faster than a Bubble sort. In benchmarks, it performed even faster.

C#
static void ShellSort()
{
  Console.WriteLine("Shell sort started…");

  // Stamp start time
  DateTime dtStart = DateTime.Now;
  int i, j, inc, iTemp, x;
  x = intArray.Length;
  inc = 3;
  while (inc > 0)
  {
    for (i = 0; i < x; i++)
    {
      j = i;
      iTemp = intArray[i];

      while ((j >= inc) && (intArray[j - inc] > iTemp))
      {
        intArray[j] = intArray[j - inc];
        j = j - inc;
      }
      intArray[j] = iTemp;
    }
    if (inc / 2 != 0)
    {
      inc = inc / 2;
    }
    else if (inc == 1)
    {
      inc = 0;
    }
    else
    {
      inc = 1;
    }
  }

  // Stamp end time
  DateTime dtEnd = DateTime.Now;
  TimeSpan ts = dtEnd.Subtract(dtStart);
  Console.WriteLine("Shell Sort completed in {0} sec {1} ms.",
                    ts.Seconds, ts.Milliseconds);
}

Conclusion

I’ve detailed three extremely useful sorting algorithms for integers. These can be applied rather easily to a string array as well, but I’ll leave you guys to that implementation. When we run the above code as a console application, we get the following benchmark data (for 100,000 records) output to the Console (note, benchmarks will vary based on the system – this particular system is a dual processor 3.0 GHz with 3.5 GB of RAM, running Windows Server 2003):

Sorting Benchmark
====================================

Preparing to sort 100000 integers:

Bubble sort started…
Shell sort started…
Insertion sort started…
Shell Sort completed in 2 sec 453 ms.
Insertion sort completed in 3 sec 234 ms.
Bubble sort completed in 41 sec 764 ms.

Thanks and Credit

I'd like to extend thanks to Public Joe's information base and examples regarding sorting algorithms.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) CentralReach
United States United States
I have worked professionally in IT since 2004, and as a software architect since 2008, specializing in user interface design and experience, something I am still extremely passionate about. In 2013 I moved into management, and since then I've held positions as Director of Product Development, Director of Engineering, and Practice Director.

Comments and Discussions

 
GeneralI like the introduction Pin
Fredrik K29-Jan-08 2:00
Fredrik K29-Jan-08 2:00 
GeneralRe: I like the introduction Pin
DreamInHex29-Jan-08 10:29
DreamInHex29-Jan-08 10:29 
GeneralRe: I like the introduction Pin
Fredrik K29-Jan-08 20:45
Fredrik K29-Jan-08 20:45 
GeneralHeapsort Pin
wtwhite2-Jan-08 17:12
wtwhite2-Jan-08 17:12 
GeneralRe: Heapsort Pin
DreamInHex29-Jan-08 10:30
DreamInHex29-Jan-08 10:30 
QuestionQuicksort Pin
GurliGebis20-Dec-07 23:29
GurliGebis20-Dec-07 23:29 
GeneralRe: Quicksort Pin
DreamInHex21-Dec-07 15:48
DreamInHex21-Dec-07 15:48 
GeneralRe: Quicksort Pin
Pete Goodsall27-Dec-07 1:10
Pete Goodsall27-Dec-07 1:10 
GeneralRe: Quicksort Pin
DreamInHex27-Dec-07 10:49
DreamInHex27-Dec-07 10:49 
GeneralRe: Quicksort Pin
DreamInHex7-Jan-08 16:42
DreamInHex7-Jan-08 16:42 
GeneralRe: Quicksort Pin
Pete Goodsall9-Jan-08 15:07
Pete Goodsall9-Jan-08 15:07 
GeneralRe: Quicksort Pin
DreamInHex28-Jan-08 7:50
DreamInHex28-Jan-08 7:50 
GeneralRe: Quicksort Pin
Warrick Procter28-Jan-08 8:53
Warrick Procter28-Jan-08 8:53 
GeneralRe: Quicksort Pin
Pete Goodsall28-Jan-08 11:43
Pete Goodsall28-Jan-08 11:43 
GeneralRe: Quicksort Pin
Pete Goodsall28-Jan-08 11:59
Pete Goodsall28-Jan-08 11:59 
GeneralRe: Quicksort Pin
DreamInHex28-Jan-08 14:21
DreamInHex28-Jan-08 14:21 
GeneralSeveral mistakes Pin
Robert Rohde20-Dec-07 20:07
Robert Rohde20-Dec-07 20:07 
GeneralRe: Several mistakes Pin
DreamInHex21-Dec-07 15:46
DreamInHex21-Dec-07 15:46 
GeneralRe: Several mistakes Pin
Sunnanbo26-Dec-07 10:38
Sunnanbo26-Dec-07 10:38 
I agree with Robert, the parallel running is deceiving. If you had a break condition in your bubble-sort that terminated the sort if the inner loop did not find any elements to swap, then the bubble-sort would terminate just after the fastest algorithm was finished, since the array got sorted behind it's back. There are other similar risks with algorithms interfering with the timing and the result of each other. There is not even a guarantee that the list is sorted after the algorithms complete since they may swap away elements that was temporarily put there by another algorithm (although I could not provoke such behavior).

I did some testing with Array.Sort, results below.

I also figured that with the very limited choice of items (0..8) the bucket sort must be mentioned, since it performs at O(n) for limited sets of sorting keys.

I did a naive implementation that slightly outperforms Array.Sort i speed, but doubles the memory consumption (could be reduced some by changing List<int> to int[]). Then I implemented an in place bucket sort that has (almost) no memory overhead. (Compared to an ordinary bucket sort, this version is not stable). With really large datasets this algorithm outperforms Array.Sort by a factor 10. But I must note that Array.Sort scales very well since it is so close to an algorithm specially crafted for the problem domain. It must be very special circumstances to raise the need to implement your own sorting, except for fun and/or education.

I ran some tests with some larger datasets:
I ran the tests several times, with results varying ~10%,
Sorting Benchmark
====================================


CPU speed 2 010 000 000 Hz


Preparing to sort 1 000 000 integers:


ArraySort sort started.
ArraySort sort completed in 	0 min 0 sec 125 ms.
Total cycles = 251 250 000, cycles/items = 251,25
Array_int_Sort sort started.
Array_int_Sort sort completed in 	0 min 0 sec 109 ms.
Total cycles = 219 843 750, cycles/items = 219,84375
BucketSort sort started.
BucketSort sort completed in 	0 min 0 sec 93 ms.
Total cycles = 188 437 500, cycles/items = 188,4375
BucketSortInplace sort started.
BucketSortInplace sort completed in 	0 min 0 sec 15 ms.
Total cycles = 31 406 250, cycles/items = 31,40625
ShellSort sort started.
ShellSort sort completed in 	10 min 19 sec 78 ms.
Total cycles = 1 244 347 031 250, cycles/items = 1244347,03125
InsertionSort sort started.
InsertionSort sort completed in 	16 min 41 sec 203 ms.
Total cycles = 2 012 418 281 250, cycles/items = 2012418,28125
BubbleSort sort started.
BubbleSort sort completed in 	59 min 48 sec 218 ms.
Total cycles = 7 212 319 687 500, cycles/items = 7212319,6875
-- All tests done --
Verify sort started.
Verify sort completed in 	0 min 0 sec 0 ms.
Total cycles = 0, cycles/items = 0
-- Everything done --


And even larger dataset (here I drop the original sorts due to infeasible execution times...)
Sorting Benchmark
====================================


CPU speed 2 010 000 000 Hz


Preparing to sort 50 000 000 integers:


ArraySort sort started.
ArraySort sort completed in 	0 min 7 sec 734 ms.
Total cycles = 15 546 093 750, cycles/items = 310,921875
Array_int_Sort sort started.
Array_int_Sort sort completed in 	0 min 7 sec 828 ms.
Total cycles = 15 734 531 250, cycles/items = 314,690625
BucketSort sort started.
BucketSort sort completed in 	0 min 6 sec 93 ms.
Total cycles = 12 248 437 500, cycles/items = 244,96875
BucketSortInplace sort started.
BucketSortInplace sort completed in 	0 min 0 sec 875 ms.
Total cycles = 1 758 750 000, cycles/items = 35,175
-- All tests done --
Verify sort started.
Verify sort completed in 	0 min 0 sec 250 ms.
Total cycles = 502 500 000, cycles/items = 10,05
-- Everything done --


I include my code below:
Any hints of further optimizing of BucketSortInplace will be greatly appreciated.
<code>
// Add System.Management.dll as reference
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;

namespace Sorting_ELMO
{
    class Program
    {
        public static int ArrLength = (int)1E6; // 50E6;
        public const int MaxSlowSort = (int)1E6;
        public const int MaxBigBucketSort = (int)60E6;
        public const int BiggestNum = 9;
        public static Random rand = new Random();
        public static double CPU_speed;

        public static int[] intArray;
        public static int[] refArray;

        public delegate void SortFunctionDelegate();

        [STAThread]
        static void Main(string[] args)
        {
            CPU_speed = GetCPU_speed();
            Console.WriteLine("Sorting Benchmark");
            Console.WriteLine("====================================\n\r");
            Console.WriteLine("CPU speed {0:n} Hz\n\r", CPU_speed);

            if (args.Length > 0)
            {
                try
                {
                    ArrLength = int.Parse(args[0]);
                }
                catch { }
            }
            // Fill array with random ints between 0 and 9
            intArray = new int[ArrLength];
            refArray = new int[ArrLength];

            for (int x = 0; x < ArrLength; x++)
            {
                refArray[x] = rand.Next(0, BiggestNum);
            }

            Console.WriteLine("Preparing to sort {0:n} integers:\n\r", intArray.Length);

            const int iterations = 1;
            for (int i = 0; i < iterations; i++)
            {
                SortRunner(ArraySort);
            }

            for (int i = 0; i < iterations; i++)
            {
                SortRunner(Array_int_Sort);
            }

            if (ArrLength <= MaxBigBucketSort)
            {
                for (int i = 0; i < iterations; i++)
                {
                    SortRunner(BucketSort);
                }
            }

            for (int i = 0; i < iterations; i++)
            {
                SortRunner(BucketSortInplace);
            }

            if (ArrLength <= MaxSlowSort)
            {
                for (int i = 0; i < iterations; i++)
                {
                    SortRunner(ShellSort);
                }

                for (int i = 0; i < iterations; i++)
                {
                    SortRunner(InsertionSort);
                }

                for (int i = 0; i < iterations; i++)
                {
                    SortRunner(BubbleSort);
                }
            }

            Console.WriteLine("-- All tests done --");

            // Reference-measure the Verify-part
            for (int i = 0; i < iterations; i++)
            {
                intArray.CopyTo(refArray, 0);
                SortRunner(Verify);
            }
            Console.WriteLine("-- Everything done --");
            Console.ReadKey();
        }

        static void SortRunner(SortFunctionDelegate sortFunction)
        {
            refArray.CopyTo(intArray, 0);
            string sortName = sortFunction.Method.Name;

            Console.WriteLine("{0} sort started…", sortName);
            // Stamp start time

            TimeSpan dtStart = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

            sortFunction();

            // Stamp end time
            TimeSpan dtEnd = System.Diagnostics.Process.GetCurrentProcess().TotalProcessorTime;

            Verify();

            TimeSpan ts = dtEnd.Subtract(dtStart);

            Console.WriteLine("{3} sort completed in \t{0} min {1} sec {2} ms.", (int)ts.TotalMinutes, ts.Seconds, ts.Milliseconds, sortName);
            Console.WriteLine("Total cycles = {0:n}, cycles/items = {1}", CPU_speed * ts.TotalSeconds, CPU_speed * ts.TotalSeconds / (double)ArrLength);
        }

        static void ArraySort()
        {
            Array.Sort(intArray);
        }

        static void Array_int_Sort()
        {
            Array.Sort<int>(intArray);
        }

        static void BucketSort()
        {
            Dictionary<int, list<int="">> buckets = new Dictionary<int, list<int="">>();
            for (int i = 0; i < BiggestNum; i++)
            {
                buckets[i] = new List<int>();
            }

            foreach (int item in intArray)
            {
                buckets[item].Add(item);
            }

            int currArrPos = 0;
            for (int i = 0; i < BiggestNum; i++)
            {
                buckets[i].CopyTo(intArray, currArrPos);
                currArrPos += buckets[i].Count;
            }
        }

        static void BucketSortInplace()
        {
            int[] bucketLength = new int[BiggestNum];
            int[] bucketStart = new int[BiggestNum];
            int[] bucketFillCount = new int[BiggestNum];

            // First pass:
            // Count each bucket-size
            foreach (int item in intArray)
            {
                bucketLength[item]++;
            }

            // Include calculate offset including prevous buckets
            for (int i = 1; i < BiggestNum; i++)
            {
                bucketStart[i] += bucketStart[i - 1] + bucketLength[i - 1];
            }

            const int invalidMarker = -1;
            int x = intArray.Count();
            int lastElement = intArray[0];
            intArray[0] = invalidMarker;
            // Move one item at a time to the next empty place in the correct bucket.
            // Pick up the item att that place to continue with
            for (int i = 0; i < x; i++)
            {
                int currElement = lastElement;
                int currPos = bucketStart[currElement] + bucketFillCount[currElement];
                lastElement = intArray[currPos];
                intArray[currPos] = currElement;
                bucketFillCount[currElement]++;
                System.Diagnostics.Debug.Assert(bucketFillCount[currElement] <= bucketLength[currElement]);
                if (lastElement == invalidMarker)
                {
                    // If the element is put back in its original place, then find a new unsorted element to continue with
                    for (int idx = 0; idx < BiggestNum; idx++)
                    {
                        for (int j = bucketFillCount[idx]; j < bucketLength[idx]; j++)
                        {
                            int tmpElement = intArray[bucketStart[idx] + j];
                            if (tmpElement != invalidMarker)
                            {
                                lastElement = tmpElement;
                                intArray[bucketStart[idx] + j] = invalidMarker;
                                goto BreakAllFor;
                            }
                        }
                    }
                BreakAllFor: ;// Do nothing more
                    // Just a security-check, this code can be removed
#if DEBUG
                    if (lastElement == invalidMarker)
                    {
                        for (int idx = 0; idx < BiggestNum; idx++)
                        {
                            System.Diagnostics.Debug.Assert(bucketFillCount[idx] == bucketLength[idx]);
                        }
                    }
#endif
                }
            }
        }

        static void BubbleSort()
        {
            int i, j, x, iTemp;

            x = intArray.Length;

            for (i = (x - 1); i >= 0; i--)
            {
                for (j = 1; j <= i; j++)
                {
                    if (intArray[j - 1] > intArray[j])
                    {
                        iTemp = intArray[j - 1];

                        intArray[j - 1] = intArray[j];

                        intArray[j] = iTemp;
                    }
                }
            }

        }

        static void InsertionSort()
        {
            int i, j, x, index;
            x = intArray.Length;
            for (i = 1; i < x; i++)
            {
                index = intArray[i];
                j = i;

                while ((j > 0) && (intArray[j - 1] > index))
                {
                    intArray[j] = intArray[j - 1];
                    j = j - 1;
                }
                intArray[j] = index;
            }

        }

        static void ShellSort()
        {
            int i, j, inc, iTemp, x;
            x = intArray.Length;
            inc = 3;
            while (inc > 0)
            {
                for (i = 0; i < x; i++)
                {
                    j = i;
                    iTemp = intArray[i];

                    while ((j >= inc) && (intArray[j - inc] > iTemp))
                    {
                        intArray[j] = intArray[j - inc];
                        j = j - inc;
                    }
                    intArray[j] = iTemp;
                }
                if (inc / 2 != 0)
                {
                    inc = inc / 2;
                }
                else if (inc == 1)
                {
                    inc = 0;
                }
                else
                {
                    inc = 1;
                }
            }
        }

        static void Verify()
        {
            int last = -1;
            int i, x;
            x = intArray.Length;
            for (i = 0; i < x; i++)
            {
                if (intArray[i] < last)
                    Console.WriteLine("Item at index {0} was {1}, last was {2}", i, intArray[i], last);
                last = intArray[i];
            }
        }

        static double GetCPU_speed()
        {
            double ghz = 0;
            System.Management.SelectQuery q = new System.Management.SelectQuery("Win32_Processor");
            System.Management.ManagementObjectSearcher searcher = new System.Management.ManagementObjectSearcher(q);
            foreach (System.Management.ManagementObject cpu in searcher.Get())
            {
                ghz += ((UInt32)(cpu["MaxClockSpeed"]) / 1000.0);
                return ghz * 1000000000; // return speed of first CPU, not the sum as original
            }
            return ghz * 1000000000;
        }
    }
}
</int></int,></int,></int></code>


// Albin
GeneralRe: Several mistakes Pin
DreamInHex27-Dec-07 10:53
DreamInHex27-Dec-07 10:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.