Click here to Skip to main content
15,112,198 members
Articles / General Programming / Sorting
Article
Posted 16 Dec 2013

Tagged as

Stats

24.4K views
234 downloads
20 bookmarked

QuickSortByRange, QuickSelectByRange, PartitionByRange

Rate me:
Please Sign up or sign in to vote.
4.82/5 (13 votes)
26 Dec 2013CPOL4 min read
A performance improvement for the classic sort / select algorithms.

Introduction

The classic QuickSort and QuickSelect algorithms have poor worst-case performance. If bad pivots are consistently chosen, such as decreasing by only a single element each time, then worst-case performance is quadratic: O(n2). Duplicate values in the array are problematic for QuickSort.

This article introduces the QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms which provide a remedy to the problem with duplicates by returning a pivot range from the PartitionByRange algorithm. This approach works well when the array contains several duplicate values.

For example, if every element of the array contains the same value, then QuickSort delivers O(n2) whereas QuickSortByRange delivers O(n). Considering that the average case performance of QuickSort is O(n log n), the all-duplicates scenario becomes a better than average case scenario for QuickSortByRange as compared to the worse than average case scenario we see with QuickSort.

Background

QuickSort and QuickSelect both rely on a partition algorithm. The partition algorithm looks at an initial (pivot) value within a range of the array and then reorganizes the values within that range so that all the values less than or equal to the pivot value are placed to the left of the pivot and the values greater than the pivot value are moved to the right of the pivot. The algorithm will move the pivot element, if necessary, to satisfy this quasi-sort and then it returns the final position of the pivot element.

For example, let's consider the following range within a larger array:

[ < , > , == , x , < , > , < , == ]

In this example, 'x' marks the pivot value, '<' indicates values less than the pivot value, '>' indicates values greater than the pivot value, and '==' indicates values equal to the pivot value.

After a single pass through the quickSort partition algorithm, these elements will have been reorganized to:

[ < , == , < , < , == , x , > , > ]

PartitionByRange takes the quasi-sort a step further. It further reorganizes the array so that elements equal to the pivot element are placed to the immediate left of the pivot element. Values less than the pivot value are placed to the left of this range of equal-value elements.

After a single pass through the PartitionByRange algorithm, these elements will have been reorganized to:

[ < , < , < , == , == , x , > , > ]

Instead of returning a single value indicating the final position of the pivot value, the PartitionByRange algorithm returns two values indicating the range of positions which contain the same value as the pivot value. This enables the QuickSortByRange and QuickSelectByRange algorithms to move the boundaries of the next pass through the PartitionByRange further to the right or left than QuickSort and QuickSelect, respectively.

Using the Code

The code is written in C# for arrays of integers. It's fairly straightforward to port the code to another language or value type.

The attached file is an HttpHandler which is designed to run on the localhost of a machine with IIS. It includes benchmarking code to replicate the results posted at the end of this article.

One important part of the PartitionByRange code is the Stack used to store the locations of values equal to the pivot value. When porting the code to another language, ensure that a LIFO structure is used to store these indexes. An array may be used in place of a stack as long as LIFO order is used.

PartitionByRange

C#
public Tuple<int, int> partitionByRange(int[] a,int left,int right,int pivot) {

  int pivotValue = a[pivot];
  Stack<int> equalValuePositions = null;
  swap(a,pivot,right);
  for(int i=left;i<right;i++)
    if(a[i] <= pivotValue) {
      swap(a,left++,i);
      if(a[i] == pivotValue) {
        if(equalValuePositions == null) equalValuePositions = new Stack<int>();
        equalValuePositions.Push(i);
      }
    }
  swap(a,left,right);
  var rightPivotIndex = left;
  if(equalValuePositions != null) foreach(int i in equalValuePositions) swap(a,--left,i);
  return new Tuple<int, int>(left,rightPivotIndex);

}

public void swap(int[] a,int i,int j){ if(i!=j){ int k=a[i]; a[i]=a[j]; a[j]=k; } }

QuickSortByRange

C#
public void quickSortByRange(int[] a) {

  qSortByRange(a,0,a.Length - 1,new Random());

}

public void qSortByRange(int[] a,int left,int right,Random r) {

  if(left < right) {
    int pivot = left + r.Next(right - left + 1);
    Tuple<int, int> pivotRange = partitionByRange(a,left,right,pivot);
    qSortByRange(a,left,pivotRange.Item1 - 1,r);
    qSortByRange(a,pivotRange.Item2 + 1,right,r);
  }

}

QuickSelectByRange

C#
public int quickSelectByRange(int[] a,int n) {

  return qSelectByRange(a,0,a.Length - 1,n,new Random());

}

public int qSelectByRange(int[] a,int left,int right,int n,Random r) {

  if(left == right) return a[left];

  int pivot = n;

  while(true) {

    Tuple<int, int> pivotRange = partitionByRange(a,left,right,pivot);
    if(n >= pivotRange.Item1 && n <= pivotRange.Item2) return a[n];
    if(n < pivotRange.Item1)
      right = pivotRange.Item1 - 1;
    else
      left = pivotRange.Item2 + 1;
    pivot = left + r.Next(right - left + 1);

  }

}

Overhead

The PartitionByRange algorithm requires a Stack (or similar LIFO structure), extra comparisons, and it returns a tuple of two integers instead of a single integer.

These overhead requirements result in a performance hit which is not overcome until the array contains at least several duplicates.

Performance

  • In arrays with no duplicate entries, the QuickSort algorithm is about 8% faster than the QuickSortByRange algorithm.
  • In an array with just a few duplicate entries, the QuickSort algorithm is about 9% faster than the QuickSortByRange algorithm.
  • In an array with several duplicates, the QuickSortByRange algorithm is about 14% faster than the QuickSort algorithm.
  • In an array with quite a few duplicates, the QuickSortByRange algorithm is about 76% faster than the QuickSort algorithm.
  • In an array with lots of duplicates, the QuickSortByRange algorithm is about 98% faster than the QuickSort algorithm.
  • In an array with all duplicates, the QuickSortByRange algorithm is about 99.9% faster than the QuickSort algorithm.

History

In the original version of this article, I incorrectly stated that QuickSortByRange delivers O(1) performance on an array where every element contains the same value.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Brien Givens
Engineer Comprehend Systems
United States United States
I've been fiddling around with computers since my parents bought me a Commodore VIC-20 for Christmas in 1981.

I received a B.Sc. in Mechanical Engineering from the University of Waterloo, but have spent my career in software development.

I focused on FoxPro during my early career before switching over to .NET and JavaScript.

I created Jooshe and wrote a Code Project article on it.

I wrote the front-end (Jooshe & JavaScript) and middleware (.NET) of dbiScript.


Comments and Discussions

 
GeneralMy vote of 2 Pin
Qwertie24-Dec-13 7:34
MemberQwertie24-Dec-13 7:34 
QuestionUm.... Pin
Qwertie23-Dec-13 14:15
MemberQwertie23-Dec-13 14:15 
AnswerMessage Closed Pin
23-Dec-13 15:19
MemberBrien Givens23-Dec-13 15:19 
GeneralRe: Um.... Pin
Qwertie24-Dec-13 6:04
MemberQwertie24-Dec-13 6:04 
GeneralMessage Closed Pin
24-Dec-13 6:35
MemberBrien Givens24-Dec-13 6:35 
GeneralRe: Um.... Pin
Qwertie24-Dec-13 7:32
MemberQwertie24-Dec-13 7:32 
GeneralRe: Um.... Pin
Brien Givens26-Dec-13 8:15
MemberBrien Givens26-Dec-13 8:15 
GeneralRe: Um.... Pin
KP Lee4-Jan-14 21:44
MemberKP Lee4-Jan-14 21:44 
Brien Givens wrote:
You're right - I've corrected the article.
I make mistakes all the time too. Sometimes I refuse to listen when I know I'm right and what I am being told is wrong. I'm really sheepish when I realize that they were right all along. Either I am really pig-headed, or thankfully, I really am usually right when I know I'm right. Lately I seem to be misunderstanding what was said, so I fairly quickly find out what I said was wrong.

One thing, even when I am blatently wrong, I leave my incorrect thoughts out there for others to read.

About 20 years ago I read about the quicksort algorithm. It was called a binary sort in the article I read. There were several things I didn't like about it. The main thing was that sorting an already sorted routine was really inefficient. (At the time I didn't know it, but it was an O(n^2) process.) Turns out my work-around is the currently best recommended method of defining the pivot point.

I wrote about this on another article written about bubble sorting. It's ancient, well documented, and an O(n^2) process. I was going to leave it when I read a comment about how efficient it was. That I couldn't let lie. My comments giving my binary sort routine can be found at a-simple-bubble-sort[^]

In the old article that I didn't like, one more thing that I wanted was to sort fixed length strings with multiple fields in the string, so I built the sorting routine to pick multiple fields to sort in ascending and descending order. In order to do that I built pointers at the start and the end of the same values and sub sorted that on the next field. I just naturally included that in my sort routine.

One thing I really DON'T like about your solution is putting duplicates in a separate memory location. On my laptop, I don't have the memory to store 200 million int values in an array. I can store 180 million items and sort it. My process would sort the same value for all fields in an O(n) manor as well, while NEVER using more memory to do it. (Other than the memory the recursive routine generates. And if all the values are the same, the routine would never make a recursive call. My tests of 180M would make about 27 concurrent recursive calls. I've blown up memory while trying to run over 65K concurrent calls.)

(I never thought to test the same value in all the rows, but I'm sure my code would easily handle it faster than your version and it would never make a recursive call in my version.)
GeneralRe: Um.... Pin
KP Lee4-Jan-14 21:50
MemberKP Lee4-Jan-14 21:50 
GeneralRe: Um.... Pin
KP Lee4-Jan-14 22:09
MemberKP Lee4-Jan-14 22:09 
GeneralRe: Um.... Pin
KP Lee5-Jan-14 1:21
MemberKP Lee5-Jan-14 1:21 
SuggestionSuggestion Pin
lafffer_11923-Dec-13 11:31
Memberlafffer_11923-Dec-13 11:31 
GeneralMy vote of 5 Pin
John Brett18-Dec-13 0:43
MemberJohn Brett18-Dec-13 0:43 
GeneralRe: My vote of 5 Pin
Brien Givens18-Dec-13 0:48
MemberBrien Givens18-Dec-13 0:48 
QuestionVery nice article about QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms Pin
Volynsky Alex17-Dec-13 11:24
professionalVolynsky Alex17-Dec-13 11:24 
AnswerRe: Very nice article about QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms Pin
Brien Givens17-Dec-13 11:29
MemberBrien Givens17-Dec-13 11:29 
GeneralRe: Very nice article about QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms Pin
Volynsky Alex17-Dec-13 11:37
professionalVolynsky Alex17-Dec-13 11:37 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.