(untagged)

Permutations in C++, Part 2

Shao Voon Wong

0.00/5 (No votes)

8 Apr 2009

Speed up the work of finding permutations on multi-core processors

Introduction

In the coming years, multi-core processors will become mainstream. It would be nice, if we have dual core CPUs, to use each core to compute 50% of the total permutations. However, next_permutation in the STL's algorithms does not let us do so. Why? The reason is that next_permutation relies on the current permutation to generate the next one.

Let us consider a situation where we have a total of 200,000,000 permutations to find. If we can find the 100,000,000^th permutation, we can use two threads calling next_permutation() 100,000,000 times. One thread will start from the 0^th permutation and the other will start from the 100,000,000^th permutation. However, to find the 100,000,000^th permutation, you have to call next_permutation() 100,000,000 times! Hence, I have created an algorithm which lets us compute the 100,000,000^th permutation.

This method is called "finding the permutation by index" or "positional permutations." I later found out that this method had been invented quite some time ago and its actual name is "factoradic." This method is to be used in conjunction with next_permutation() to spread the workload among different cores or SMP processors. Of course, factoradic can be used to find all 200,000,000 permutations, but since next_permutation() is much faster than the factoradic implementation, we are only using factoradic to find the n^th permutation. From the n^th permutation onwards, we are going to use next_permutation() all the way.

The Technique of Finding Permutations

Let us have a quick run-through of how to find permutations before I show you how to find a permutation given its index. I will show you an example of finding the permutation of 0, 1, 2, 3. In the first column, write down these numbers:

0,
1,
2,
3,

For the first column, each number is going to have 3! rows because there are 3 other unwritten columns.

0,
0,
0,
0,
0,
0,

1,
1,
1,
1,
1,
1,

2,
2,
2,
2,
2,
2,

3,
3,
3,
3,
3,
3,

For the first column, let us find the permutations of the number 0. We will ignore 1, 2 and 3 for the time-being. For the second column, each number is going to have 2! rows because there are two other unwritten columns.

0,1,
0,1,
0,2,
0,2,
0,3,
0,3,

For the last two columns, write down the numbers that are not taken by the 1^st and 2^nd column in ascending order, and the next row in descending order.

0,1,2,3
0,1,3,2
0,2,1,3
0,2,3,1
0,3,1,2
0,3,2,1

Using the same method, you can find permutations for the other three numbers. That is: 1, 2 and 3 in the first column. All of the permutations are shown below:

0,1,2,3
0,1,3,2
0,2,1,3
0,2,3,1
0,3,1,2
0,3,2,1

1,0,2,3
1,0,3,2
1,2,0,3
1,2,3,0
1,3,0,2
1,3,2,0

2,0,1,3
2,0,3,1
2,1,0,3
2,1,3,0
2,3,0,1
2,3,1,0

3,0,1,2
3,0,2,1
3,1,0,2
3,1,2,0
3,2,0,1
3,2,1,0

The Technique of Finding a Permutation Given the Index

Note that the total number of permutations is a result of taking the factorial of the total number of elements. To explain the algorithm, let me take you through the steps of an example. Given the index 79, we need to find the permutation from a set of five elements: 0, 1, 2, 3 and 4. The total number of permutations is 120. Since this technique is zero-based, the last index would be the 119^th. Let us begin by finding the element in column 1, where columns are one-based.

0,1,2,3,4 = factorial(4) * 0 = 0th Permutation
0,x,x,x,x
.
.
0,x,x,x,x
1,0,2,3,4 = factorial(4) * 1 = 24th Permutation
1,x,x,x,x
.
.
1,x,x,x,x
2,0,1,3,4 = factorial(4) * 2 = 48th Permutation
2,x,x,x,x
.
.
2,x,x,x,x
3,0,1,2,4 = factorial(4) * 3 = 72th Permutation
3,x,x,x,x
.
.
3,x,x,x,x
4,0,1,2,3 = factorial(4) * 4 = 96th Permutation
4,x,x,x,x
.
.
4,3,2,1,0 = Last permutation(119th)
x,x,x,x,x = factorial(4) * 5 = 120th Permutation(out of bounds)

To find the 1^st element, we need to find in the positional permutations which one 79 is greater than, smaller than or equal to for the next positional permutation. As we can see, 79 is sandwiched between the 72^nd and 96^th; the 5^th element is 3. As for why factorial(4), it is because we are finding one element out of five elements now. Subtract 1 from 5 elements and you get 4. Let me give an example to clarify.

To get from 0, 1, 2, 3, 4 (0^th permutation) to 1, 0, 2, 3, 4 (24^th permutation), columns 2 to 5 would have to permute 4!-1 times because there are 4 columns to permute. Before we proceed to find the 2^nd element, we need to subtract 72 from 79, which is 7, before proceeding to the next iteration. Let us now find the 2^nd element from the remaining elements (0, 1, 2, 4):

0,1,2,4 = factorial(3) * 0 = 0th Permutation
0,x,x,x
.
.
0,x,x,x
1,0,2,4 = factorial(3) * 1 = 6th Permutation
1,x,x,x
.
.
1,x,x,x
2,0,1,4 = factorial(3) * 2 = 12th Permutation
2,x,x,x
.
.
2,x,x,x
4,0,1,2 = factorial(3) * 3 = 18th Permutation
4,x,x,x
.
.
4,2,1,0 = Last permutation(23th)
x,x,x,x = factorial(3) * 4 = 24th Permutation(out of bounds)

As we can see, 7 is sandwiched between the 6^th and 12^th permutations, so the 2^nd element is 1. Subtracting 6 from 7 leaves 1. Let the next iteration begin:

0,2,4 = factorial(2) * 0 = 0th Permutation
0,x,x
2,0,4 = factorial(2) * 1 = 2nd Permutation
2,x,x
4,0,2 = factorial(2) * 2 = 4th Permutation
4,2,0 = Last permutation(5th)
x,x,x = factorial(2) * 3 = 6th Permutation(out of bounds)

As we can see, 1 is sandwiched between the 0^th and 2^nd permutations, so the 3^rd element is 0. There is nothing to subtract in this iteration.

2,4 = factorial(1) * 0 = 0th Permutation
4,2 = factorial(1) * 1 = 1st Permutation

As we can see, 1 is equal to the 1^st, so the 4^th element is 4. There is no need for the next iteration because only 2 is left, so the 5^th element is 2. The generated permutation is 3, 1, 0, 4, 2 for index 79. You can verify this by calling next_permutation() 78 times. It is 78 times because the first permutation 0, 1, 2, 3, 4 is not generated by next_permutation(), but rather is fed into next_permutation(). As you may have noticed, the last permutation is always in descending order. Can we reverse-engineer 3, 1, 0, 4, 2 and get the index 79? Sure, just follow the steps below:

Factorial(4) * IndexOf(3) in {0,1,2,3,4} = 4! * 3 = 72

Remove 3 from 0, 1, 2, 3, 4. It turns into 0, 1, 2, 4.

Factorial(3) * IndexOf(1) in {0,1,2,4} = 3! * 1 = 6

Remove 1 from 0, 1, 2, 4. It turns into 0, 2, 4.

Factorial(2) * IndexOf(0) in {0,2,4} = 2! * 0 = 0

Remove 0 from 0, 2, 4. It turns into 2, 4.

Factorial(1) * IndexOf(4) in {2,4} = 1! * 1 = 1

Remove 4 from 2, 4. It turns into 2.

Factorial(0) * IndexOf(2) in {2} = 1! * 0 = 0

Add up all the calculations and you will get 72 + 6 + 0 + 1 + 0 = 79.

Code Examples for Finding Permutations by Index (Factoradic)

#include <iostream>
#include "Factorial.h"
#include "FindPermByIdx.h"

void main()
{
   const int SIZE = 3;
   // Initialize the factorial class of 100 factorial: it is optional
   // It would be faster if you need to call FindPerm() many times
   // If you don't initialize this class, 
   // each factorial have to be calculated
   // each time it is needed.   CFactorial<BigInteger>::Init();
   CFindPermByIdx<BigInteger> findperm;

   BigInteger biResults;
   // Initialize the first permutation
   vector<unsigned int> vec(SIZE);
   vec[0]=0;
   vec[1]=1;
   vec[2]=2;

   int cnt = 0;
   bool bContinue=true;
   // Display the first permutation
   {
      cout<<cnt<<")";
      ++cnt;
      for( int j=0; j<SIZE; ++j )
      {
         cout<< vec[j] << ",";
      }
      
      cout<<endl;
   }
   do
   {
      // Find the permutation, given its index(cnt)
      bContinue = findperm.FindPerm( SIZE, cnt, vec );

      if( bContinue )
      {
         // Display the results
         cout<<cnt<<")";
         for( int j=0; j<SIZE; ++j )
         {
            cout<< vec[j] << ",";
         }
      }
      
      cout<<endl;

      ++cnt;
   }
   while( bContinue );

   // If you have called CFactorial::Init();, you must call its Destroy()
   CFactorial<BigInteger>::Destroy();   
}

You may have noticed that CFindPermByIdx operates on numbers only. The contents of the first permutation vector always start from zero and are in running numbers. Unlike next_permutation(), which operates on objects by comparing, CFindPermByIdx operates on zero-based running numbers. What you will get from its results are indices. For example, if you get a result of {0, 2, 1}, it means that the 1^st element is index 0 of the array (of objects) that you originally wanted to permute. The 2^nd element refers to index 2 of the array and the 3^rd element is index 1 of the array.

Permuting numbers is much easier and faster than permuting objects. CFindPermByIdx uses a big integer class because this algorithm uses factorials to find the permutation and factorials are usually very large numbers. Since it is a templated class and you know the factorials used are not going to exceed 64 bits, you can use the __int64 type instead. The is the output of the above example code:

0)0,1,2,
1)0,2,1,
2)1,0,2,
3)1,2,0,
4)2,0,1,
5)2,1,0,

Benchmark Between next_permutation() and CFindPermByIdx

Below are the timing results of next_permutation() and CFindPermByIdx finding the permutations of 8 elements, which is 40320 permutations in total.

time taken for next_permutation: 1 milliseconds
time taken for CFindPermByIdx: 3915 milliseconds

The source code for the benchmark is included for download.

Code Examples for Finding Index from Permutation

#include <iostream>
#include "Factorial.h"
#include "FindPermByIdx.h"

void main()
{
   const int SIZE = 4;
   // Initialize the factorial class of 100 factorial: it is optional
   // It would be faster if you need to call FindIndex() many times
   // If you don't initialize this class, 
   // each factorial have to be calculated
   // each time it is needed.
   CFactorial<BigInteger>::Init();

   CFindPermByIdx<BigInteger> findperm;

   BigInteger biResults;
   // Initialize the first permutation

   vector<unsigned int> vec(SIZE);
   vec[0]=0;
   vec[1]=1;
   vec[2]=2;
   vec[3]=3;

   do
   {
      // find the index of this permutation
      findperm.FindIndex( 4, biResults, vec );

      // Display the results
      cout<<biResults<<")";
      for( int j=0; j<SIZE; ++j )
      {
         cout<< vec[j] << ",";
      }

      cout<<endl;
   }
   while( next_permutation( vec.begin(), vec.end() ) );

   // If you have called CFactorial::Init();, you must call its Destroy()
   CFactorial<BigInteger>::Destroy();
}

This is the output of the above example code:

0.)0,1,2,3,
1.)0,1,3,2,
2.)0,2,1,3,
3.)0,2,3,1,
4.)0,3,1,2,
5.)0,3,2,1,
6.)1,0,2,3,
7.)1,0,3,2,
8.)1,2,0,3,
9.)1,2,3,0,
10.)1,3,0,2,
11.)1,3,2,0,
12.)2,0,1,3,
13.)2,0,3,1,
14.)2,1,0,3,
15.)2,1,3,0,
16.)2,3,0,1,
17.)2,3,1,0,
18.)3,0,1,2,
19.)3,0,2,1,
20.)3,1,0,2,
21.)3,1,2,0,
22.)3,2,0,1,
23.)3,2,1,0,

A Benchmark Program for Your PC

I have included a demo program called TimePermutations for you to benchmark your PC. It is nice to know how much time has been reduced if you have a dual-core or quad-core PC. The program does not store the permutations anywhere and discards them after every computation. If you have a uniprocessor PC, then you do not have much use for this benchmark program. Below are the screenshots of the program.

Conclusion

In this article, I introduced an algorithm called "finding permutations by index" or "factoradic" to let us find the n^th permutation. It continues from there with next_permutation() to split and speed up the work on many processors. I look forward to your feedback and hope you like my article!

References

History

7 April, 2009 -- Updated benchmark source and application
12 March 2009 - Changed the source code to use C++ Big Integer Library written by Matt McCutchen, so as to remove the need to download and build Crypto++, and also to reduce download sizes
19 November, 2007 -- Original version posted

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here