Factoring RSA Numbers Using Goldbach's Conjecture

Roger Doss

4.80/5 (8 votes)

4 Jan 2012CPOL3 min read

22K

366

Exploring the dream of designing a new algorithm for factoring semi-primes

Download source - 2.98 KB

Introduction

In our effort to design a new algorithm for factoring semi-primes, we turn our attention to the Goldbach conjecture. The conjecture states that any even number greater or equal to 4 can be represented as the sum of two primes. We may refine the conjecture by stating that any even number greater or equal to 8 can be represented as the sum of two primes distinct. We can then define the maximum Goldbach partition (maxGB) as the product of the largest two distinct primes that sum to the even number e=p+q. We note that e can factor n=p*q by way of Quadratic equation. There is also the minimum Goldbach partition (minGB) which is the product of the least two distinct primes that sum to the even number e=p+q.

Background

Initial attempts at developing an algorithm utilizing both partitions resulted in failure. By graphing the output of minGB and maxGB, we see that the data for minGB is almost random, while maxGB is almost a smooth curve. Our attention then immediately turned to using a binary search on the maxGB partitions in order to approximate e=p+q for a given n=p*q. Below is the graph of the data:

With this, our approximation of e for a given n became:

bool Q(unsigned long long n, unsigned long long e, unsigned long long *p, 
    unsigned long long *q)
{
    unsigned long long p_q = (unsigned long long)sqrt((e * e) - (4 * n));
    *p = (e - p_q) / 2;
    *q = (e + p_q) / 2;
    if((*p * *q) == n) {
        return true;
    }
    return false;
}
 
unsigned long long f(unsigned long long n, unsigned long long min, unsigned long long max)
{
    unsigned long long e = 0, p = 0, q = 0, p_e = 0, tmp = 0;
    if((min%2)) { min++; }
    if((max%2)) { max++; }
    while(min < max) {
        e = (min + max) / 2;
        if((e % 2) == 1) e++;
        if(p_e == e) {
            // printf("%d\n",e);
            return e;
        }
	tmp = maxGB(e);
        if(tmp > n) {
            max = e - 1;
        }
        if(tmp < n) {
            min = e + 1;
        }
        if(tmp == n) {
            Q(n,e,&p,&q);
            // printf("e=%d\tp=%d\tq=%d\n",e,p,q);
            return e;
        }
        if(Q(n,e,&p,&q)) {
            // printf("e=%d\tp=%d\tq=%d\n",e,p,q);
            return e;
        }
        p_e = e;
    }
    // printf("%d\n",e);
    return e;
}

Where f is our approximation logic and Q is our Quadratic equation logic for factoring n. A caller of f would do the following:

unsigned long long e = f(n,2,n+1);

The logic would then try to find an approximation for what e=p+q may be. We note that the algorithm is a binary search algorithm and therefore runs in O(logN) time. maxGB is defined as:

unsigned long long maxGB(unsigned long long i)
{
    unsigned long long t = i / 2, s = t;
    while(t) {
        t--;
        s++;
        if(is_prime(t) && is_prime(s) && ((s + t) == i)) {
            return t * s;
        }
    }
    return 0;
}

Using the Code

The code is designed to run as a standalone command line tool. Calling function f on a given semi-prime n produces an approximation of e=p+q which can be used as a starting point to find the real value and therefore factor n.

Testing of the algorithm using a set of pseudo-random generated semi-primes produces results that look promising. In particular, comparing the algorithm to the output of Pollard-Rho on those semi-primes shows that Pollard-Rho required much more iterations in order to factor. In some cases, Pollard-Rho failed to factor the semi-primes, while in all cases, our algorithm worked successfully and rather quickly.

There are limitations here when handling large semi-primes and the algorithm was only tested on 64bit semi-primes. Large semi-primes require that methods such as ‘is_prime’ use a probabilistic approach rather than trial division. Multi-threading may be used to help in finding the actual number e for larger semi-primes. An example run of the algorithm:

$ ./maxGB 55594832159
factored: e=471744 p=229487        q=242257        trials=86

Shows that it took 86 trials to factor the number while factoring using Pollard-Rho took:

$ ./prho.exe 55594832159
factored: n=55594832159 p=242257        q=229487        trials=5783

5783 and that these results were consistently seen while factoring 32bit and 64bit semi-primes. In comparing the two algorithms, we factored 1024 32bit+ semi-primes. For Pollard-Rho, we were successful for 788 cases and failed for 236 cases. We were successful in factoring all cases using our algorithm. Additionally, the average number of trials for Pollard-Rho was 26766 while for our algorithm it was 916, a factor of about 29 faster. Further research is needed to test the algorithm on larger semi-primes.

Points of Interest

It would be interesting to develop the algorithm for large scale semi-primes and test to see how well it factors larger numbers. At the time of this writing, no test has been done for large scale numbers. The algorithm would need to have a probabilistic algorithm for determining if a number is prime and usage of the built-in integer classes would have to be replaced with an arbitrary precise type. In testing with 32 bit and 64 bit semi-primes, the code appeared to perform better than Pollard-Rho. The latter algorithm was used as a bench mark.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)