nilcit nilcit - 11 months ago 63
Python Question

Why is this Haskell code so slow?

I'm kind of new to Haskell and tried making a scrabble solver. It takes in the letters you currently have, finds all permutations of them and filters out those that are dictionary words. The code's pretty simple:

import Data.List

main = do
dict <- readFile "words"
letters <- getLine
let dictWords = words dict
let perms = permutations letters
print [x | x <- perms, x `elem` dictWords]

However it's incredibly slow, compared to a very similar implementation I have with Python. Is there something fundamental I'm doing wrong?

*edit: Here's my Python code:

from itertools import permutations

letters = raw_input("please enter your letters (without spaces): ")

d = open('words')
dictionary = [line.rstrip('\n') for line in d.readlines()]

perms = ["".join(p) for p in permutations(letters)]

validWords = []

for p in perms:
if p in dictionary: validWords.append(p)

for validWord in validWords:
print validWord

I didn't time them precisely, but roughly it feels like the Python implementation is about 2x as fast as the Haskell one. Perhaps I should't have said the Haskell code was "incredibly slow" in comparison, but since Haskell is statically typed I guess I just thought that it should've been much faster, and not slower than Python at all.

Answer Source

I'm kind of new to Haskell and tried making a scrabble solver.

You can substantially improve things by using a better algorithm.

Instead of testing every permutation of the input letters, if you sort them first you can make only one dictionary lookup and get all of the possible words (anagrams) which may be formed from them (using all of them).

Here is code which creates that dictionary as a Data.Map. There is a start-up cost to creating the Map, but after the first query subsequent lookups are very fast.

import Data.List
import qualified Data.Map.Strict as Map
import Control.Monad
import System.IO

main = do
  contents <- readFile "words"
  let pairs = [ (sort w, [w]) | w <- words contents ]
      dict = foldl' (\m (k,v) -> Map.insertWith (++) k v m) Map.empty pairs
      -- dict = foldr (\(k,v) m -> Map.insertWith (++) k v m) Map.empty pairs
  forever $ do
    putStr "Enter letters: " >> hFlush stdout
    letters <- getLine
    case Map.lookup (sort letters) dict of
      Nothing -> putStrLn "No words."
      Just ws -> putStrLn $ "Words: " ++ show ws

Map creation time for a word file of 236K words (2.5 MB) is about 4-5 seconds. Better performance is likely possible by using ByteStrings or Text instead of Strings.

Some good letter combinations to try:

steer rat tuna lapse groan neat

Note: Using GHC 7.10.2 I found this code performed the best without compiling with -O2.