Gabriel B S M - 5 months ago 19

Java Question

Consider this method:

`public static int[] countPairs(int min, int max) {`

int lastIndex = primes.size() - 1;

int i = 0;

int howManyPairs[] = new int[(max-min)+1];

for(int outer : primes) {

for(int inner : primes.subList(i, lastIndex)) {

int sum = outer + inner;

if(sum > max)

break;

if(sum >= min && sum <= max)

howManyPairs[sum - min]++;

}

i++;

}

return howManyPairs;

}

As you can see, I have to count how many times each number between min and max can be expressed as a sum of two primes.

`primes`

`min`

`max`

My method works fine, but the goal here is to do something faster.

My method takes two loops, one inside the other, and it makes my algorithm an O(n²). It sucks like bubblesort.

How can I rewrite my code to accomplish the same result with a better complexity, like O(nlogn)?

One last thing: I'm coding in Java, but your reply can be in also Python, VB.Net, C#, Ruby, C or even just a explanation in English.

Answer

For each number `x`

between `min`

and `max`

, we want to compute the number of ways `x`

can be written as the sum of two primes. This number can also be expressed as

```
sum(prime(n)*prime(x-n) for n in xrange(x+1))
```

where `prime(x)`

is 1 if x is prime and 0 otherwise. Instead of counting the number of ways that two primes add up to `x`

, we consider all ways two nonnegative integers add up to `x`

, and add 1 to the sum if the two integers are prime.

This isn't a more efficient way to do the computation. However, putting it in this form helps us recognize that the output we want is the discrete convolution of two sequences. Specifically, if `p`

is the infinite sequence such that `p[x] == prime(x)`

, then the convolution of `p`

with itself is the sequence such that

```
convolve(p, p)[x] == sum(p[n]*p[x-n] for n in xrange(x+1))
```

or, substituting the definition of `p`

,

```
convolve(p, p)[x] == sum(prime(n)*prime(x-n) for n in xrange(x+1))
```

In other words, convolving `p`

with itself produces the sequence of numbers we want to compute.

The straightforward way to compute a convolution is pretty much what you were doing, but there are much faster ways. For `n`

-element sequences, a fast Fourier transform-based algorithm can compute the convolution in `O(n*log(n))`

time instead of `O(n**2)`

time. Unfortunately, this is where my explanation ends. Fast Fourier transforms are kind of hard to explain even when you have proper mathematical notation available, and as my memory of the Cooley-Tukey algorithm isn't as precise as I'd like it to be, I can't really do it justice.

If you want to read more about convolution and Fourier transforms, particularly the Cooley-Tukey FFT algorithm, the Wikipedia articles I've just linked would be a decent start. If you just want to use a faster algorithm, your best bet would be to get a library that does it. In Python, I know `scipy.signal.fftconvolve`

would do the job; in other languages, you could probably find a library pretty quickly through your search engine of choice.