## Introduction

This article presents a code library for generating combinations. A "combination" is a subset of the input sequence with a certain length. For example, given an input of `{1, 2, 3, 4}`

, requesting all the length-2 combinations gives `{1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, and {3, 4}`

. The number of combinations of size *r* drawn from an input of size *n* is referred to as *n*C*r*, which is equal to *n*! ÷ [*r*!(*n*-*r*)!]. However, this formula is only correct if the input is actually a set (all the elements are distinct). I needed an algorithm that would properly handle duplicates in the input, and the code in this article does just that. By coincidence more than by design, each combination's elements are returned in sorted order, and the combinations themselves are also returned in order.

## Using the Code

To generate all the length-3 combinations of integers from the set `{1, 2, 3, 4, 5}`

:

int[] input = new int[] { 1, 2, 3, 4, 5 };
Combinations<int> combinations = new Combinations<int>(input, 3);
foreach (int[] combination in combinations)
{
}

This will result in the body of the `foreach`

being run with `combination`

set to each of the following values: `{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, and {3, 4, 5}`

.

Note that the input need not be an array; it can be any implementation of `IEnumerable`

, including most of the standard collections classes. Also, the input need not be `int`

s; simply by changing the generic type parameter on the `Combinations`

object, any data type can be used.

If the inputs contain duplicates, they will be handled properly. For example, the following code:

int[] input = new int[] { 1, 2, 2, 3 };
Combinations<int> combinations = new Combinations<int>(input, 3);
foreach (int[] combination in combinations)
{
}

will yield the following values of `combination`

: `{1, 2, 2}, {1, 2, 3}, {2, 2, 3}`

.

It is possible to use an alternative `IComparer`

for equality and sorting. For example:

string[] input = new string[] { "alpha", "ALPHA", "beta", "gamma" };
Combinations<string> combinations = new Combinations<string>(input,
StringComparer.CurrentCultureIgnoreCase, 2);
foreach (string[] combination in combinations)
{
}

results in `combination`

taking on the following values: `{"alpha", "alpha"}, {"alpha", "beta"}, {"alpha", "gamma"}, {"beta", "gamma"}`

. "`alpha`

" and "`ALPHA`

" are considered equivalent, and whichever version appears first in the input list (in this case "alpha") is considered canonical; only that form will appear in the output.

Certain procedures can be carried out when the algorithm only knows the *n* value (the set of inputs), but before the *r* value (the output size) is known. These procedures are actually carried out in the `ElementSet`

class. An `ElementSet`

is constructed internally by the two `Combinations`

constructors which accept `IEnumerable`

s, but it is also possible to construct an `ElementSet`

separately and pass it into a `Combinations`

constructor. This is useful if the same input will be processed for more than one output length. For example:

int[] input = new int[] { 1, 2, 3, 4, 5 };
ElementSet<int> elements = new ElementSet<int>(input);
for (int i = 0; i <= 5; i++) {
Combinations<int> combos = new Combinations<int>(elements, i);
foreach (int[] combo in combos)
{
}
}

This code will efficiently generate all the subsets of `{1, 2, 3, 4, 5}`

. Any processing which does not depend on the length of the subsets is done outside the loop. Also, this example demonstrates that the corner cases of *r*=0 and *r*=*n* are handled properly. In the *r*=0 case, a single array of length zero is returned; in the *r*=*n* case, a single array containing the entire input is returned.

It is possible to use both a custom `IComparer`

and a preconstructed `ElementSet`

. The `IComparer`

must be passed into the `ElementSet`

constructor along with the input data. The `IComparer`

is not passed to the `Combinations`

constructor in this case.

## Complexity Analysis

For this discussion, *n* is the size of the input, *r* is the size of the output, and *m* is the number of "distinct" elements in the input.

### Cycles

`ElementSet`

constructor

Assuming the `IEnumerable`

passed to the constructor can enumerate its contents in insignificant time, the constructor operates in *O*(*n*log_{2}*n*).

`Combinations`

constructor

*O*(1), given an already-constructed `ElementSet`

; otherwise, *O*(*n*log_{2}*n*).

`Combinations.GetEnumerator()`

*O*(*r*)

`IEnumerator.Reset()`

*O*(*r*)

`IEnumerator.Current`

`(get)`

*O*(*r*)

`IEnumerator.MoveNext()`

*O*(*r*×*m*)

### Storage

Only large quantities of storage are mentioned here.

`ElementSet<T>`

- SortedList<T, int> of size
*m*
- int[
*r*]
- SortedDictionary<T, int> of size
*m* only during construction

`Combinations<T>`

This class has no significant storage requirements.

`IEnumerator<T[]>`

- int[
*r*]
- int[
*m*]
- T[
*r*] only during a call to `Current`

property-getter
- Stack space for recursion to depth
*O*(*r*) only during a call to `MoveNext()`

## Diagnostics

The source archive contains a solution with two projects and three configurations. In the "Test" configuration, only the "`Combinations`

" project will be built, and 13 NUnit tests will be compiled into the assembly. In the "Debug" and "Release" configurations, the "`Combinations`

" project's test cases will not be compiled, while the "`Benchmark`

" project will be compiled, which provides a simple console-mode benchmarking program that exercises the library. My benchmarks on a 1.7GHz processor in release mode range from over four million combinations per second for sets of 3 integers from a set of 10 distinct down to 66,571 combinations per second for sets of 700 integers from a set of 1000 with 10% duplicate entries. The input and output sizes impact the benchmarks much more than the percentage duplicates. Benchmarks for constructing `ElementSet`

s are also provided, passing in arrays of integers in both ascending and random order. My figures range from about 1.5 million input elements per second for a dataset of 10 integers to about 500,000 elements per second for an input of 1000 integers. Sorted-vs-random ordering is insignificant.

## How It Works

The `ElementSet`

constructor first gathers all the input elements. A `SortedList<T, int>`

is populated such that each key maps to the number of times it occurs. Actually, a `SortedDictionary`

is populated first, then copied to a `SortedList`

, as this method yields better complexity. Also, the `ElementSet`

calculates a cumulative sum backwards over the elements, storing the values into an `int[]`

, such that one can, in *O*(1), determine how many elements (not distinct elements) exist at and after a given position in the list. This is needed later.

The `Combinations`

class does very little except to contain an `ElementSet`

and the *r* value. All the work happens in the `IEnumerator`

.

The `IEnumerator`

uses "indices" to generate combinations. Imagining all the input elements lined up from left to right, with duplicates stacked above each other, the indices are initially "pushed" as far to the left as possible, such that each index points to its own element (not distinct element). Moving to the next value consists of sliding an index to the right. If it falls off the edge or hits another index, the next index is advanced one element to the right, and the current index is then pushed back to the left until it hits another index. Once all the indices have moved as far to the right as possible, there are no more combinations to generate. Other code is dedicated to making sure that an index never moves so far to the right that it doesn't leave enough room for other indices which reside to the right of it; this is what the backwards cumulative sum is used for. Once the indices have settled on their positions, a combination is generated in the `Current`

property-getter by simply indexing the `SortedList<T, int>.Keys`

collection with each index in turn, storing the values into an array.

## Other Similar Code

There is already a Combinatorial Algorithms in C# article which contains a combinations algorithm (as well as others), but that algorithm apparently fails when presented with duplicates in its input. For my purposes that was a problem. Also, that article's code implements `IEnumerator`

, while mine implements `IEnumerable`

; hence my code can be used in a `foreach`

but the other article's cannot. gybrush's explanation is that "If you have requested enumerator over the same sequence previously it must not become invalid, because it still may be in use. This is impossible with the current implementation of the Combinatorial classes and I don't see a way to easily modify this." My algorithm, on the other hand, presents a very clear division of code which makes supporting multiple `IEnumerator`

s easy. Finally, my code is generic, so it generates combinations in a typesafe way.