jdw1996 jdw1996 - 21 days ago 6
Python Question

Haskell slower than Python in naïve integer factorization?

I'm taking a math course where we had to do some integer factorizations as an intermediate step to a problem. I decided to write a Python program to do this for me (we weren't being tested on our ability to factor, so this is completely above board). The program is as follows:

#!/usr/bin/env python3

import math
import sys

# Return a list representing the prime factorization of n. The factorization is
# found using trial division (highly inefficient).
def factorize(n):

def factorize_helper(n, min_poss_factor):
if n <= 1:
return []
prime_factors = []
smallest_prime_factor = -1
for i in range(min_poss_factor, math.ceil(math.sqrt(n)) + 1):
if n % i == 0:
smallest_prime_factor = i
break
if smallest_prime_factor != -1:
return [smallest_prime_factor] \
+ factorize_helper(n // smallest_prime_factor,
smallest_prime_factor)
else:
return [n]

if n < 0:
print("Usage: " + sys.argv[0] + " n # where n >= 0")
return []
elif n == 0 or n == 1:
return [n]
else:
return factorize_helper(n, 2)

if __name__ == "__main__":
factorization = factorize(int(sys.argv[1]))
if len(factorization) > 0:
print(factorization)


I've been teaching myself some Haskell as well, so I decided to try rewriting the program in Haskell. That program is as follows:

import System.Environment

-- Return a list containing all factors of n at least x.
factorize' :: (Integral a) => a -> a -> [a]
factorize' n x = smallestFactor
: (if smallestFactor == n
then []
else factorize' (n `quot` smallestFactor) smallestFactor)
where
smallestFactor = getSmallestFactor n x
getSmallestFactor :: (Integral a) => a -> a -> a
getSmallestFactor n x
| n `rem` x == 0 = x
| x > (ceiling . sqrt . fromIntegral $ n) = n
| otherwise = getSmallestFactor n (x+1)

-- Return a list representing the prime factorization of n.
factorize :: (Integral a) => a -> [a]
factorize n = factorize' n 2

main = do
argv <- getArgs
let n = read (argv !! 0) :: Int
let factorization = factorize n
putStrLn $ show (factorization)
return ()


(note: this requires a 64-bit environment. On 32-bit, import
Data.Int
and use
Int64
as the type annotation on
read (argv !! 0)
)

After I'd written this, I decided to compare the performance of the two, recognizing that there are better algorithms, but that the two programs use essentially the same algorithm. I do, for example, the following:

$ ghc --make -O2 factorize.hs
$ /usr/bin/time -f "%Uu %Ss %E" ./factorize 89273487253497
[3,723721,41117819]
0.18u 0.00s 0:00.23


Then, timing the Python program:

$ /usr/bin/time -f "%Uu %Ss %E" ./factorize.py 89273487253497
[3, 723721, 41117819]
0.09u 0.00s 0:00.09


Naturally, the times vary slightly each time I run one of the programs, but they are always in this range, with the Python program several times quicker than the compiled Haskell program. It seems to me that the Haskell version should be able to run quicker, and I'm hoping you can give me an idea of how to improve it so that this is the case.

I've seen some tips on optimizing Haskell programs, as in answers to this question, but can't seem to get my program running any quicker. Are loops this much quicker than recursion? Is Haskell's I/O particularly slow? Have I made a mistake in actually implementing the algorithm? Ideally, I'd like to have an optimized version of the Haskell that is still relatively easy to read

Answer

If you compute limit = ceiling . sqrt . fromIntegral $ n only once, instead of once per iteration, then I see the Haskell version being faster:

limit = ceiling . sqrt . fromIntegral $ n
smallestFactor = getSmallestFactor x

getSmallestFactor x
    | n `rem` x == 0 = x
    | x > limit      = n
    | otherwise      = getSmallestFactor (x+1)

using this version, I see:

$ time ./factorizePy.py 89273487253497
[3, 723721, 41117819]

real    0m0.236s
user    0m0.171s
sys     0m0.062s

$ time ./factorizeHs  89273487253497
[3,723721,41117819]

real    0m0.190s
user    0m0.000s
sys     0m0.031s