Click here to Skip to main content
Click here to Skip to main content

Factoring RSA Numbers Using Goldbach's Conjecture

By , 4 Jan 2012
 

Introduction

In our effort to design a new algorithm for factoring semi-primes, we turn our attention to the Goldbach conjecture. The conjecture states that any even number greater or equal to 4 can be represented as the sum of two primes. We may refine the conjecture by stating that any even number greater or equal to 8 can be represented as the sum of two primes distinct. We can then define the maximum Goldbach partition (maxGB) as the product of the largest two distinct primes that sum to the even number e=p+q. We note that e can factor n=p*q by way of Quadratic equation. There is also the minimum Goldbach partition (minGB) which is the product of the least two distinct primes that sum to the even number e=p+q.

Background

Initial attempts at developing an algorithm utilizing both partitions resulted in failure. By graphing the output of minGB and maxGB, we see that the data for minGB is almost random, while maxGB is almost a smooth curve. Our attention then immediately turned to using a binary search on the maxGB partitions in order to approximate e=p+q for a given n=p*q. Below is the graph of the data:

image001.jpg

With this, our approximation of e for a given n became:

bool Q(unsigned long long n, unsigned long long e, unsigned long long *p, 
    unsigned long long *q)
{
    unsigned long long p_q = (unsigned long long)sqrt((e * e) - (4 * n));
    *p = (e - p_q) / 2;
    *q = (e + p_q) / 2;
    if((*p * *q) == n) {
        return true;
    }
    return false;
}
 
unsigned long long f(unsigned long long n, unsigned long long min, unsigned long long max)
{
    unsigned long long e = 0, p = 0, q = 0, p_e = 0, tmp = 0;
    if((min%2)) { min++; }
    if((max%2)) { max++; }
    while(min < max) {
        e = (min + max) / 2;
        if((e % 2) == 1) e++;
        if(p_e == e) {
            // printf("%d\n",e);
            return e;
        }
	tmp = maxGB(e);
        if(tmp > n) {
            max = e - 1;
        }
        if(tmp < n) {
            min = e + 1;
        }
        if(tmp == n) {
            Q(n,e,&p,&q);
            // printf("e=%d\tp=%d\tq=%d\n",e,p,q);
            return e;
        }
        if(Q(n,e,&p,&q)) {
            // printf("e=%d\tp=%d\tq=%d\n",e,p,q);
            return e;
        }
        p_e = e;
    }
    // printf("%d\n",e);
    return e;
}

Where f is our approximation logic and Q is our Quadratic equation logic for factoring n. A caller of f would do the following:

unsigned long long e = f(n,2,n+1);

The logic would then try to find an approximation for what e=p+q may be. We note that the algorithm is a binary search algorithm and therefore runs in O(logN) time. maxGB is defined as:

unsigned long long maxGB(unsigned long long i)
{
    unsigned long long t = i / 2, s = t;
    while(t) {
        t--;
        s++;
        if(is_prime(t) && is_prime(s) && ((s + t) == i)) {
            return t * s;
        }
    }
    return 0;
}

Using the Code

The code is designed to run as a standalone command line tool. Calling function f on a given semi-prime n produces an approximation of e=p+q which can be used as a starting point to find the real value and therefore factor n.

Testing of the algorithm using a set of pseudo-random generated semi-primes produces results that look promising. In particular, comparing the algorithm to the output of Pollard-Rho on those semi-primes shows that Pollard-Rho required much more iterations in order to factor. In some cases, Pollard-Rho failed to factor the semi-primes, while in all cases, our algorithm worked successfully and rather quickly.

There are limitations here when handling large semi-primes and the algorithm was only tested on 64bit semi-primes. Large semi-primes require that methods such as ‘is_prime’ use a probabilistic approach rather than trial division. Multi-threading may be used to help in finding the actual number e for larger semi-primes. An example run of the algorithm:

$ ./maxGB 55594832159
factored: e=471744 p=229487        q=242257        trials=86

Shows that it took 86 trials to factor the number while factoring using Pollard-Rho took:

$ ./prho.exe 55594832159
factored: n=55594832159 p=242257        q=229487        trials=5783

5783 and that these results were consistently seen while factoring 32bit and 64bit semi-primes. In comparing the two algorithms, we factored 1024 32bit+ semi-primes. For Pollard-Rho, we were successful for 788 cases and failed for 236 cases. We were successful in factoring all cases using our algorithm. Additionally, the average number of trials for Pollard-Rho was 26766 while for our algorithm it was 916, a factor of about 29 faster. Further research is needed to test the algorithm on larger semi-primes.

Points of Interest

It would be interesting to develop the algorithm for large scale semi-primes and test to see how well it factors larger numbers. At the time of this writing, no test has been done for large scale numbers. The algorithm would need to have a probabilistic algorithm for determining if a number is prime and usage of the built-in integer classes would have to be replaced with an arbitrary precise type. In testing with 32 bit and 64 bit semi-primes, the code appeared to perform better than Pollard-Rho. The latter algorithm was used as a bench mark.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

RogerGDoss
Software Developer (Senior)
United States United States
Dr. Doss holds a PhD in Applied Computer Science. His research interests include Operating Systems, Networking, Theory of NP Completeness,and Software Engineering.
http://www.rdoss.com

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
SuggestionGMPmemberOrkblutt10-Jan-12 9:25 
Hi Roger,
 
I'm not sure this algo is really efficient on large number...
Anyway, here a little port of your code using GMP ( i removed the not used f1 and f2 functions to fit here)
 
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <gmp.h>

typedef struct arg {
    mpz_t e;
    mpz_t end;
    mpz_t n;
	arg(mpz_t _e,
        mpz_t _end,
        mpz_t _n)
    {
		mpz_init_set(e, _e);
		mpz_init_set(end, _end);
		mpz_init_set(n, _n);
    }
} arg_t;
 

 
bool is_prime(mpz_t i)
{
    return (mpz_probab_prime_p(i, 5) > 0);
}
 
void maxGB(mpz_t i, mpz_t* retValue)
{
    mpz_t t, s, tmp;
    mpz_init(t);
    mpz_init(s);
    mpz_init(tmp);
 
    mpz_set(t, i);
    mpz_tdiv_q_ui(t, t, 2);
 
    mpz_set(s, t);
 
    while(mpz_cmp_ui (t, 0) != 0)
    {
		
        mpz_sub_ui(t, t, 1);
        mpz_add_ui(s, s, 1);
        if(is_prime(t) && is_prime(s))
        {
            mpz_add(tmp, t, s);
 
            if(mpz_cmp(tmp, i) == 0)
            {
                mpz_mul(tmp, t, s);
				mpz_set(*retValue, tmp);
				
                return;
            }
        }
    }
 
    mpz_set_ui(*retValue, 0);
 
}
 
bool Q(mpz_t n, mpz_t e, mpz_t *p, mpz_t *q)
{
	bool bReturn = false;
    mpz_t p_q, tmp1, tmp2;
    mpz_init(p_q);
    mpz_init(tmp1);
    mpz_init(tmp2);
 
    mpz_set(tmp1, e);
    mpz_mul(tmp1, e, e);
 
    mpz_mul_ui(tmp2, n, 4);
	
    mpz_sub(tmp1, tmp1, tmp2);
	mpz_abs(tmp1, tmp1);
    mpz_sqrt(p_q, tmp1);
 
    mpz_sub(*p, e, p_q);
    mpz_tdiv_q_ui(*p,*p, 2);
 
    mpz_add(*q, e, p_q);
    mpz_tdiv_q_ui(*q, *q, 2);
 
    mpz_mul(tmp1, *p, *q);
 

 
    if(mpz_cmp(tmp1, n) == 0)
        bReturn = true;
 
exit:
 
	mpz_clear(p_q);
	mpz_clear(tmp1);
	mpz_clear(tmp2);
 
    return bReturn;
}
 
void f(mpz_t n, mpz_t min, mpz_t max, mpz_t* retValue)
{
    mpz_t e, p, q, p_e, tmp, tmp1, tmp2;
 
    mpz_init_set_ui(e, 0);
    mpz_init_set_ui(p, 0);
    mpz_init_set_ui(q, 0);
    mpz_init_set_ui(p_e, 0);
    mpz_init_set_ui(tmp, 0);
    mpz_init(tmp1);
    mpz_init(tmp2);
 
    mpz_mod_ui(tmp1, min, 2);
 
    if(mpz_cmp_ui(tmp1, 0) != 0)
        mpz_add_ui(min, min, 1);
	
 
    mpz_mod_ui(tmp1, max, 2);
 
    if(mpz_cmp_ui(tmp1, 0) != 0)
        mpz_add_ui(max, max, 1);
 

    while(mpz_cmp(max, min) > 0)
    {
        mpz_add(tmp1, min, max);
        mpz_tdiv_q_ui(e, tmp1, 2);
 

        if(mpz_fdiv_ui(e, 2) == 1)
            mpz_add_ui(e, e, 1);
 
        if(mpz_cmp(p_e, e) == 0)			
            goto exit;
		
 
        maxGB(e, &tmp);
 
        if(mpz_cmp(tmp, n) > 0)
		{
            mpz_sub_ui(max, e, 1);
		}
        else if(mpz_cmp(tmp, n) < 0)
		{
            mpz_add_ui(min, e, 1);
		}
        else
        {
			Q(n, e, &p, &q);
			goto exit;
        }
 

		
		if(Q(n, e, &p, &q))			
			goto exit;
		
 
		mpz_set(p_e, e);
 
    }
 
exit:
 
	mpz_set(*retValue, e);
 
	mpz_clear(e);
	mpz_clear(p);
	mpz_clear(q);
	mpz_clear(p_e);
	mpz_clear(tmp);
	mpz_clear(tmp1);
	mpz_clear(tmp2);
}
 

void *worker_thread(void *args)
{
    arg_t *a = (arg_t *)args;
 
    mpz_t trials, p, q, i;
 
	mpz_init_set_ui(trials, 0);
	mpz_init_set_ui(p, 0);
	mpz_init_set_ui(q, 0);
 
	mpz_init_set(i, a->e);
 
	while(mpz_cmp(i, a->end) < 0)
	{
		if(Q(a->n, i, &p, &q))
		{
			gmp_printf("factored: e=%Zd\tp=%Zd\tq=%Zd\ttrials=%Zd\n", i,p,q,trials);
			exit(0);
		}
		mpz_add_ui(i, i, 2);
		mpz_add_ui(trials, trials, 1);
	}   
    return NULL;
}
 
#define NR_THREADS 16
void f3(mpz_t n)
{
	mpz_t e, p, q, i, trials, quantum, tmp1, tmp2;
 
	mpz_init(e);
	mpz_init(quantum);
 
	mpz_init_set_ui(p, 0);
	mpz_init_set_ui(q, 0);
	mpz_init_set_ui(i, 0);
	mpz_init_set_ui(trials, 0);
	mpz_init(tmp1);
	mpz_init(tmp2);
 
	mpz_set_ui(tmp1, 2);
	mpz_add_ui(tmp2, n, 1);
 
	f(n, tmp1, tmp2, &e);
 
	gmp_printf("e: %Zd\n", e);
 

	if(Q(n, e, &p, &q))
	{
		gmp_printf("factored: e=%Zd\tp=%Zd\tq=%Zd\ttrials=%Zd\n", e,p,q,trials);
		return;
    } 
	else
	{
        // unsigned long long quantum = (n - e) / (NR_THREADS - 1);
   
		mpz_sub(tmp1, n, e);
		if(NR_THREADS > 1)
			mpz_tdiv_q_ui(quantum, tmp1, NR_THREADS - 1);
		else
			mpz_set(quantum, tmp1);
 
		gmp_printf("quantum: %Zd\n", quantum);
 
        pthread_t threads[NR_THREADS]={0};
 
        // printf("quantum=%lld e=%lld n=%lld\n",quantum,e,n);
        for(int nr_threads = 0; nr_threads < NR_THREADS; nr_threads++) 
		{
			mpz_mul_ui(tmp1, quantum, nr_threads);
			mpz_add(tmp1, tmp1, e);
			
			mpz_add_ui(tmp2, n, 1);
 
            arg_t *a = new arg_t(tmp1 , tmp2, n);
 
            pthread_create(&threads[nr_threads], NULL, worker_thread, (void *)a);
        }
        for(int nr_threads = 0; nr_threads < NR_THREADS; nr_threads++) {
            pthread_join(threads[nr_threads], NULL);
        }
		
    }
    printf("failed\n");
 
}
 

 

int main(int argc, char **argv)
{
//    unsigned long long n = strtoull(argv[1],NULL,10);
    mpz_t n;
    mpz_init_set_str(n, argv[1], 10);
 
    f3(n);
}
 
 
all the best,
 
orkblutt
GeneralRe: GMPmemberRogerGDoss16-Jan-12 12:44 
Thanks for this. I will look into it Smile | :)
QuestionA test for youmembersinan kul9-Jan-12 21:08 
The following problem requires factoring a big semiprime number:
 
http://mathalon.in/?page=show_problem.php&pid=230
 
I used msieve to factor it and it took one and half minute.
 
Sat Nov 19 16:04:37 2011 factoring 3619676750393866934107694376100695465808683251086711882079193237255791 (70 digits)
...
Sat Nov 19 16:06:13 2011 elapsed time 00:01:36
 
If you are interested.
AnswerRe: A test for youmemberRogerGDoss16-Jan-12 12:43 
Hello, I have the factoring as:
37841503604208515820789215805895747 * 95653618530932453942039677215639653
I believe it was as fast as the method you mention (running on Fedora with
four cores). The algorithm I used though is the one from my previous post
on factoring. Smile | :) So far, I only tested the Goldbach approach on small integers...
Thanks
GeneralRe: A test for youmembersinan kul16-Jan-12 21:40 
Hello,
 
Then it's good news. That's the correct factorization. I think your code is much simpler than msieve.
How did you handle big integers? Would you consider updating the article with it?
 
Maybe you should compare it with msieve running msieve on your machine.
 
msieve can be found here:
http://sourceforge.net/projects/msieve/[^]
Suggestionpossible optimizationmemberSkond5-Jan-12 21:04 
It seems you could increase the performance by calling
maxGB(e)
once in f().
I mean there is no need to call it 3 times for the same value, which doesn't change between calls.
        if(maxGB(e) > n) {
            max = e - 1;
        }
        if(maxGB(e) < n) {
            min = e + 1;
        }
        if(maxGB(e) == n) {
            Q(n,e,&p,&q);
            // printf("e=%d\tp=%d\tq=%d\n",e,p,q);
            return e;
        }
hallelujah

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130617.1 | Last Updated 4 Jan 2012
Article Copyright 2012 by RogerGDoss
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid