In this article, I am presenting a sorting schema, hugely inspired by the Bucket and Proxmap sorting algorithms. I call it "schema", rather than "algorithm", simply because it introduces a (or another, depending on the sorting algorithm) "devide and conquer" element to the existing sorting algorithms, aiming (imho) to make them faster.

Also, let's recall that most of the sorting algorithms come with the time complexity

${\displaystyle O(N \log{N})} \tag{1}$

With more details ...

1. The description of the schema

Let's suppose we have a list L with N (list size, thus a number) objects to sort. Now, let's imagine that we have M (a number) buckets which are pre-sorted (buckets, not the content) according to the same criteria applied to sort the objects from the list L. Let's use strings as an example, if

$L=\{\text{BBBB}, \text{BBAA}, \text{AAAA},\text{...}\}$

then we can "bucket" by the 1st letter of the string, giving us a total of 26 letters in the English alphabet (and yes let's ignore the case sensitivity and numbers and ... everything to keep things easy). We keep these buckets sorted as

$B_1=\{A\}, B_2=\{B\},...,B_{26}=\{Z\}$

Then we partition/distribute the content of L accross the buckets, sort the buckets individually and (finally) merge the content of the buckets to the final sorted list (simple traversal of the buckets, since they are already sorted). Something like this

What we have (in terms of time complexity) as a result

List L traversal:
${\displaystyle O(N)}$
It's important that the distribution of objects to buckets is
${\displaystyle O(1)}$
per object, e.g. using hash maps
Sorting per bucket (from (1)) is
${\displaystyle O\left(\frac{N}{M} \cdot\log{\frac{N}{M}}\right)}$
Assumptions is that the content of the list L is "uniformly" distributed (!!!) across the buckets.
Merging into the final list is
${\displaystyle O(M)}$
Again, buckets are pre-sorted.

2. The mathematical argument

Time complexity in the schema described above is

${\displaystyle O\left(N + M \cdot \frac{N}{M}\cdot \log{\left(\frac{N}{M}\right)}+M\right)}$

which is

${\displaystyle O\left(N + N\cdot\log{\left(\frac{N}{M}\right)}+M\right)} \tag{2}$

If we compare (1) and (2)

$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}= 1-\frac{\log{M} - 1}{\log{N}}+\frac{M}{N\cdot\log{N}} \tag{3}$

Assuming large enough M < N we have

$\frac{N + N\cdot\log{\left(\frac{N}{M}\right)}+M}{N \cdot\log{N}}< 1-\frac{\log{M} - 2}{\log{N}}< 1 \tag{4}$

In other words, for large N and M we should expect this schema to make most of the sorting algorithms "a little bit" faster. By how much? Let's see ...

3. Specific example

Let's consider lists of strings as an example. Here is the source code of a little PoC project which generates lists of randomly generated strings and sorts them using

the Java's Collections::sort and
an implementation of the schema above, which internally uses Collections::sort as well to sort the content of the buckets

and compares the results. No parallelisation is used, although sorting buckets in parallel could significantly improve the speed in the second (B) case, but no cheating!

Most of the aspects in the code are easy to tune, but we will consider only

lists of size N=1000000 and N=3000000 and
bucketing by the first 2 letters, giving the total number of buckets M=26²

Plugging these numbers into the formula (3) we should expect

*M=26²* and *N=1000000*	*M=26²* and *N=3000000*
0.60	0.63

An improvement of nearly 40%!? No way ...

4. Actual results

One more test parameter I should mention (and which is not part of the formula (3)) is X the length of each string added to the list L, this is to count for Java string comparison where string size plays a role. We will look for X=10, X=50 and X=100. Here are some results from running the PoC code on my fairly old computer with an Intel i5-3330 3.00GHz 4 Cores CPU on board, using Java 17

*X=10, M=26²* and *N=1000000*	*X=10, M=26²* and *N=3000000*
Reported result: Fast Sort avg: 422.33ms Std Sort avg: 677.96ms	Reported result: Fast Sort avg: 1484.24ms Std Sort avg: 2395.39ms
$\frac{422.33}{677.96}\approx 0.6229$	$\frac{1484.24}{2395.39}\approx 0.6196$

A few more results

*X=50, M=26²* and *N=3000000*	*X=100, M=26²* and *N=3000000*
Reported result: Fast Sort avg: 1545.82ms Std Sort avg: 2773.65ms	Reported result: Fast Sort avg: 1561.84ms Std Sort avg: 2883.08ms
$\frac{1545.82}{2773.65}\approx 0.5573$	$\frac{1561.84}{2883.08}\approx 0.5417$

5. Conclusions

Well, the test results are not too far off from the calculated results. So, the schema does improve the sorting ... However, I should mention the following:

It works well when the list L is "uniformly distributed" across the buckets (already mentioned in section 1). Imagine that all the objects fall into one bucket, then we are back to
${\displaystyle O(N \log{N})}$
or slightly worse.
The test times don't include the time required to "build" the buckets. Buckets are meant to be "built" once and re-used.

And finally, the schema allows for

parallelisation, content of the buckets can be sorted in parallel and
imagine a data streaming scenario, if one bucket is updated, we don't have to re-sort the entire list L