12,554,623 members (56,603 online)
alternative version

137.5K views
52 bookmarked
Posted

# Combination Generator

, 11 Jun 2006 CPOL
 Rate this:
A combination generator that works properly with duplicate symbols in its input.

## Introduction

This article presents a code library for generating combinations. A "combination" is a subset of the input sequence with a certain length. For example, given an input of `{1, 2, 3, 4}`, requesting all the length-2 combinations gives `{1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, and {3, 4}`. The number of combinations of size r drawn from an input of size n is referred to as nCr, which is equal to n! ÷ [r!(n-r)!]. However, this formula is only correct if the input is actually a set (all the elements are distinct). I needed an algorithm that would properly handle duplicates in the input, and the code in this article does just that. By coincidence more than by design, each combination's elements are returned in sorted order, and the combinations themselves are also returned in order.

## Using the Code

To generate all the length-3 combinations of integers from the set `{1, 2, 3, 4, 5}`:

```int[] input = new int[] { 1, 2, 3, 4, 5 };
Combinations<int> combinations = new Combinations<int>(input, 3);
foreach (int[] combination in combinations)
{
// Do something with "combination".
}```

This will result in the body of the `foreach` being run with `combination` set to each of the following values: `{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, and {3, 4, 5}`.

Note that the input need not be an array; it can be any implementation of `IEnumerable`, including most of the standard collections classes. Also, the input need not be `int`s; simply by changing the generic type parameter on the `Combinations` object, any data type can be used.

If the inputs contain duplicates, they will be handled properly. For example, the following code:

```int[] input = new int[] { 1, 2, 2, 3 };
Combinations<int> combinations = new Combinations<int>(input, 3);
foreach (int[] combination in combinations)
{
// Do something with "combination".
}```

will yield the following values of `combination`: `{1, 2, 2}, {1, 2, 3}, {2, 2, 3}`.

It is possible to use an alternative `IComparer` for equality and sorting. For example:

```string[] input = new string[] { "alpha", "ALPHA", "beta", "gamma" };
Combinations<string> combinations = new Combinations<string>(input,
StringComparer.CurrentCultureIgnoreCase, 2);
foreach (string[] combination in combinations)
{
// Do something with "combination".
}```

results in `combination` taking on the following values: `{"alpha", "alpha"}, {"alpha", "beta"}, {"alpha", "gamma"}, {"beta", "gamma"}`. "`alpha`" and "`ALPHA`" are considered equivalent, and whichever version appears first in the input list (in this case "alpha") is considered canonical; only that form will appear in the output.

Certain procedures can be carried out when the algorithm only knows the n value (the set of inputs), but before the r value (the output size) is known. These procedures are actually carried out in the `ElementSet` class. An `ElementSet` is constructed internally by the two `Combinations` constructors which accept `IEnumerable`s, but it is also possible to construct an `ElementSet` separately and pass it into a `Combinations` constructor. This is useful if the same input will be processed for more than one output length. For example:

```int[] input = new int[] { 1, 2, 3, 4, 5 };
ElementSet<int> elements = new ElementSet<int>(input);
for (int i = 0; i <= 5; i++) {
Combinations<int> combos = new Combinations<int>(elements, i);
foreach (int[] combo in combos)
{
// Do something with "combo".
}
}```

This code will efficiently generate all the subsets of `{1, 2, 3, 4, 5}`. Any processing which does not depend on the length of the subsets is done outside the loop. Also, this example demonstrates that the corner cases of r=0 and r=n are handled properly. In the r=0 case, a single array of length zero is returned; in the r=n case, a single array containing the entire input is returned.

It is possible to use both a custom `IComparer` and a preconstructed `ElementSet`. The `IComparer` must be passed into the `ElementSet` constructor along with the input data. The `IComparer` is not passed to the `Combinations` constructor in this case.

## Complexity Analysis

For this discussion, n is the size of the input, r is the size of the output, and m is the number of "distinct" elements in the input.

### Cycles

`ElementSet` constructor

Assuming the `IEnumerable` passed to the constructor can enumerate its contents in insignificant time, the constructor operates in O(nlog2n).

`Combinations` constructor

O(1), given an already-constructed `ElementSet`; otherwise, O(nlog2n).

`Combinations.GetEnumerator()`

O(r)

`IEnumerator.Reset()`

O(r)

`IEnumerator.Current` `(get)`

O(r)

`IEnumerator.MoveNext()`

O(r×m)

### Storage

Only large quantities of storage are mentioned here.

`ElementSet<T>`

• SortedList<T, int> of size m
• int[r]
• SortedDictionary<T, int> of size m only during construction

`Combinations<T>`

``` ```This class has no significant storage requirements.

`IEnumerator<T[]>`

• int[r]
• int[m]
• T[r] only during a call to `Current` property-getter
• Stack space for recursion to depth O(r) only during a call to `MoveNext()`

## Diagnostics

The source archive contains a solution with two projects and three configurations. In the "Test" configuration, only the "`Combinations`" project will be built, and 13 NUnit tests will be compiled into the assembly. In the "Debug" and "Release" configurations, the "`Combinations`" project's test cases will not be compiled, while the "`Benchmark`" project will be compiled, which provides a simple console-mode benchmarking program that exercises the library. My benchmarks on a 1.7GHz processor in release mode range from over four million combinations per second for sets of 3 integers from a set of 10 distinct down to 66,571 combinations per second for sets of 700 integers from a set of 1000 with 10% duplicate entries. The input and output sizes impact the benchmarks much more than the percentage duplicates. Benchmarks for constructing `ElementSet`s are also provided, passing in arrays of integers in both ascending and random order. My figures range from about 1.5 million input elements per second for a dataset of 10 integers to about 500,000 elements per second for an input of 1000 integers. Sorted-vs-random ordering is insignificant.

## How It Works

The `ElementSet` constructor first gathers all the input elements. A `SortedList<T, int>` is populated such that each key maps to the number of times it occurs. Actually, a `SortedDictionary` is populated first, then copied to a `SortedList`, as this method yields better complexity. Also, the `ElementSet` calculates a cumulative sum backwards over the elements, storing the values into an `int[]`, such that one can, in O(1), determine how many elements (not distinct elements) exist at and after a given position in the list. This is needed later.

The `Combinations` class does very little except to contain an `ElementSet` and the r value. All the work happens in the `IEnumerator`.

The `IEnumerator` uses "indices" to generate combinations. Imagining all the input elements lined up from left to right, with duplicates stacked above each other, the indices are initially "pushed" as far to the left as possible, such that each index points to its own element (not distinct element). Moving to the next value consists of sliding an index to the right. If it falls off the edge or hits another index, the next index is advanced one element to the right, and the current index is then pushed back to the left until it hits another index. Once all the indices have moved as far to the right as possible, there are no more combinations to generate. Other code is dedicated to making sure that an index never moves so far to the right that it doesn't leave enough room for other indices which reside to the right of it; this is what the backwards cumulative sum is used for. Once the indices have settled on their positions, a combination is generated in the `Current` property-getter by simply indexing the `SortedList<T, int>.Keys` collection with each index in turn, storing the values into an array.

## Other Similar Code

There is already a Combinatorial Algorithms in C# article which contains a combinations algorithm (as well as others), but that algorithm apparently fails when presented with duplicates in its input. For my purposes that was a problem. Also, that article's code implements `IEnumerator`, while mine implements `IEnumerable`; hence my code can be used in a `foreach` but the other article's cannot. gybrush's explanation is that "If you have requested enumerator over the same sequence previously it must not become invalid, because it still may be in use. This is impossible with the current implementation of the Combinatorial classes and I don't see a way to easily modify this." My algorithm, on the other hand, presents a very clear division of code which makes supporting multiple `IEnumerator`s easy. Finally, my code is generic, so it generates combinations in a typesafe way.

## Share

No Biography provided

## You may also be interested in...

 Pro Pro

 First PrevNext
 Language Member 1145688716-Feb-15 14:44 Member 11456887 16-Feb-15 14:44
 Re: Language Garth J Lancaster16-Feb-15 16:17 Garth J Lancaster 16-Feb-15 16:17
 How to deal with very large sets? Miodrag Nejkovic21-Jul-13 4:24 Miodrag Nejkovic 21-Jul-13 4:24
 Re: How to deal with very large sets? Hawk77721-Jul-13 9:54 Hawk777 21-Jul-13 9:54
 Re: How to deal with very large sets? Miodrag Nejkovic21-Jul-13 10:10 Miodrag Nejkovic 21-Jul-13 10:10
 Hi, thank You for fast answer. I've build Your code, get Combinations.dll and use it like this : ```int[] Elements = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }; Combinations C = new Combinations(Elements, 20);   Random RNDGen = new Random(DateTime.Now.Millisecond); long RNDCombination = (long)Math.Floor(RNDGen.NextDouble() * 3535316142212174320); ``` And now I want to get that combination like : ```int[] RNDComb = C.ElementAt(RNDCombination); ``` But .ElementAt() takes int as input index. Am I making some mistake ??? How could I get some combination from Your class, lets say 233423435th combination. Thank You
 Re: How to deal with very large sets? Hawk77728-Jul-13 8:33 Hawk777 28-Jul-13 8:33
 Great Code, but need some help ptxav14-Jan-13 17:09 ptxav 14-Jan-13 17:09
 Re: Great Code, but need some help PIEBALDconsult14-Jan-13 17:27 PIEBALDconsult 14-Jan-13 17:27
 Re: Great Code, but need some help ptxav14-Jan-13 17:53 ptxav 14-Jan-13 17:53
 Re: Great Code, but need some help PIEBALDconsult14-Jan-13 17:55 PIEBALDconsult 14-Jan-13 17:55
 Re: Great Code, but need some help Hawk77718-Jan-13 21:59 Hawk777 18-Jan-13 21:59
 Re: Great Code, but need some help PIEBALDconsult19-Jan-13 4:21 PIEBALDconsult 19-Jan-13 4:21
 Re: Great Code, but need some help Hawk77718-Jan-13 22:04 Hawk777 18-Jan-13 22:04
 A recursive program in C to generate nCk Manohar Bhat20-Jan-12 8:19 Manohar Bhat 20-Jan-12 8:19
 My vote of 5 eg_Anubhava21-Jul-11 3:50 eg_Anubhava 21-Jul-11 3:50
 Very Real and Good Example eg_Anubhava21-Jul-11 3:49 eg_Anubhava 21-Jul-11 3:49
 Got this working in vb.net 2010 Megalithic128-Jun-11 12:08 Megalithic1 28-Jun-11 12:08
 Re: demo Hawk77725-Dec-10 22:36 Hawk777 25-Dec-10 22:36
 Re: demo Hawk7774-Jan-11 19:33 Hawk777 4-Jan-11 19:33
 What if ordrer matters? antonis745-Jul-10 23:53 antonis74 5-Jul-10 23:53
 Re: What if ordrer matters? Hawk7776-Jul-10 18:09 Hawk777 6-Jul-10 18:09
 Thanks for the code sansiro18-Apr-10 4:20 sansiro 18-Apr-10 4:20
 Re: Thanks for the code Hawk77718-Apr-10 9:56 Hawk777 18-Apr-10 9:56
 How to count collection sizes? dxbamboo15-Apr-10 19:54 dxbamboo 15-Apr-10 19:54
 Re: How to count collection sizes? Hawk77715-Apr-10 20:43 Hawk777 15-Apr-10 20:43
 NEED HELP prako_jl18-Dec-09 7:36 prako_jl 18-Dec-09 7:36
 Re: NEED HELP Hawk77719-Dec-09 9:50 Hawk777 19-Dec-09 9:50
 Re: NEED HELP prako_jl2-Jan-10 15:12 prako_jl 2-Jan-10 15:12
 Re: NEED HELP Hawk7772-Jan-10 20:20 Hawk777 2-Jan-10 20:20
 Re: NEED HELP prako_jl3-Jan-10 6:36 prako_jl 3-Jan-10 6:36
 Need help Vikas Misra(TCS)14-Oct-09 4:14 Vikas Misra(TCS) 14-Oct-09 4:14
 Re: Need help Hawk77715-Oct-09 0:05 Hawk777 15-Oct-09 0:05
 Re: Need help Vikas Misra(TCS)15-Oct-09 1:13 Vikas Misra(TCS) 15-Oct-09 1:13
 Re: Need help [modified] Hawk77715-Oct-09 19:16 Hawk777 15-Oct-09 19:16
 Running out of memory... Andew S Johnson27-Apr-09 12:21 Andew S Johnson 27-Apr-09 12:21
 Re: Running out of memory... Hawk77728-Apr-09 8:22 Hawk777 28-Apr-09 8:22
 Thanks for the code Veloz21-Apr-09 8:13 Veloz 21-Apr-09 8:13
 Re: Thanks for the code Hawk77723-Apr-09 12:35 Hawk777 23-Apr-09 12:35
 Scalability.. Every combination with every possible m value mwaller1-Nov-07 15:54 mwaller 1-Nov-07 15:54
 Re: Scalability.. Every combination with every possible m value Hawk7771-Nov-07 17:51 Hawk777 1-Nov-07 17:51
 With/Without repetitions [modified] Stein R31-Jul-07 4:18 Stein R 31-Jul-07 4:18
 Re: With/Without repetitions Hawk77731-Jul-07 22:04 Hawk777 31-Jul-07 22:04
 Re: With/Without repetitions [modified] Stein R1-Aug-07 4:05 Stein R 1-Aug-07 4:05
 possible bug? Peter Leng6-Mar-07 6:13 Peter Leng 6-Mar-07 6:13
 Re: possible bug? Hawk7776-Mar-07 20:51 Hawk777 6-Mar-07 20:51
 allow duplicates elements in the combination? Peter Leng5-Mar-07 10:43 Peter Leng 5-Mar-07 10:43
 Re: allow duplicates elements in the combination? Hawk7775-Mar-07 15:44 Hawk777 5-Mar-07 15:44
 Re: allow duplicates elements in the combination? Peter Leng6-Mar-07 6:08 Peter Leng 6-Mar-07 6:08
 Last Visit: 31-Dec-99 18:00     Last Update: 25-Oct-16 11:44 Refresh 12 Next »