Gabriel B S M Gabriel B S M - 4 months ago 14
Java Question

Find how many times each number between N and M can be expressed as a sum of a pair of primes

Consider this method:

public static int[] countPairs(int min, int max) {
int lastIndex = primes.size() - 1;
int i = 0;
int howManyPairs[] = new int[(max-min)+1];

for(int outer : primes) {
for(int inner : primes.subList(i, lastIndex)) {
int sum = outer + inner;

if(sum > max)
break;

if(sum >= min && sum <= max)
howManyPairs[sum - min]++;
}

i++;
}

return howManyPairs;
}


As you can see, I have to count how many times each number between min and max can be expressed as a sum of two primes.

primes
is an ArrayList with all primes between 2 and 2000000. In this case,
min
is 1000000 and
max
is 2000000, that's why primes goes until 2000000.

My method works fine, but the goal here is to do something faster.

My method takes two loops, one inside the other, and it makes my algorithm an O(n²). It sucks like bubblesort.

How can I rewrite my code to accomplish the same result with a better complexity, like O(nlogn)?

One last thing: I'm coding in Java, but your reply can be in also Python, VB.Net, C#, Ruby, C or even just a explanation in English.

Answer

For each number x between min and max, we want to compute the number of ways x can be written as the sum of two primes. This number can also be expressed as

sum(prime(n)*prime(x-n) for n in xrange(x+1))

where prime(x) is 1 if x is prime and 0 otherwise. Instead of counting the number of ways that two primes add up to x, we consider all ways two nonnegative integers add up to x, and add 1 to the sum if the two integers are prime.

This isn't a more efficient way to do the computation. However, putting it in this form helps us recognize that the output we want is the discrete convolution of two sequences. Specifically, if p is the infinite sequence such that p[x] == prime(x), then the convolution of p with itself is the sequence such that

convolve(p, p)[x] == sum(p[n]*p[x-n] for n in xrange(x+1))

or, substituting the definition of p,

convolve(p, p)[x] == sum(prime(n)*prime(x-n) for n in xrange(x+1))

In other words, convolving p with itself produces the sequence of numbers we want to compute.

The straightforward way to compute a convolution is pretty much what you were doing, but there are much faster ways. For n-element sequences, a fast Fourier transform-based algorithm can compute the convolution in O(n*log(n)) time instead of O(n**2) time. Unfortunately, this is where my explanation ends. Fast Fourier transforms are kind of hard to explain even when you have proper mathematical notation available, and as my memory of the Cooley-Tukey algorithm isn't as precise as I'd like it to be, I can't really do it justice.

If you want to read more about convolution and Fourier transforms, particularly the Cooley-Tukey FFT algorithm, the Wikipedia articles I've just linked would be a decent start. If you just want to use a faster algorithm, your best bet would be to get a library that does it. In Python, I know scipy.signal.fftconvolve would do the job; in other languages, you could probably find a library pretty quickly through your search engine of choice.

Comments