Lopofsky Lopofsky - 7 months ago 16
Javascript Question

Convert text prediction script [Markov Chain] from javascript to python

i've been trying the last couple days to convert this js script to python code.

My implementation (blindfull cp mostly, some minor fixes here and there) so far:

import random
class markov:
memory = {}
separator = ' '
order = 2

def getInitial(self):
ret = []
for i in range(0, self.order, 1):
ret.append('')
return ret

def breakText(self, txt, cb):
parts = txt.split(self.separator)
prev = self.getInitial()
def step(self):
cb(prev, self.next)
prev.shift()#Javascript function.
prev.append(self.next)
#parts.forEach(step) # - step is the function above.
cb(prev, '')

def learn(self, txt):
mem = self.memory
def learnPart(key, value):
if not mem[key]:
mem[key] = []
mem[key] = value
return mem
self.breakText(txt, learnPart)

def step(self, state, ret):
nextAvailable = self.memory[state] or ['']
self.next = nextAvailable[random.choice(nextAvailable.keys())]
if not self.next:
return ret
ret.append(next)
nextState = state.slice(1)
return self.step(nextState, ret)

def ask(self, seed):
if not seed:
seed = self.genInitial()
seed = seed + self.step(seed, []).join(self.separator)
return seed


Issues:


  1. I have absolutely no knowledge of javascript.

  2. When i try to "learn" some text to a "markov" class object [e.g.: a=markov(); a.learn("sdfg");] i get the following error: "TypeError: unhashable type: 'list'", for the "mem" dictionary at the "learnPart" function, member of the "learn" function.

    So my question so far is why does this exception [TypeError for a list object, falsely referring to a dictionary object (which is hashable)] occur?



thanks in advance for any suggestions, directions, points, help in general :D

Answer

Guy who wrote the article speaking. Glad you found it useful! Now, my first implementation of a Markov chain was actually in Python, so this answer will focus on how to write it in a more Pythonic way. I'll show how to go about making an order-2 Markov chain, since they're easy to talk about, but you can of course make it order-N with some modifications.

Data Structures

In js, the two prominent data structures are the generic object and the array (which is an extension to the generic object). In Python however, you have other options for more finely-grained control. Here're the major differences in the two implementations:

  • A state in our chain is really a tuple - an immutable, ordered structure, with a fixed amount of elements. We always want n elements (in this case, n=2) and their order has meaning.

  • Manipulating the memory will be easier if we use a defaultdict wrapping a list, so we can skip the "checking if a state doesn't exist, and then doing X", and instead just do X.

So, we stick a from collections import defaultdict at the top and change how markov.memory is defined:

memory = defaultdict(list)

Now we change markov.getInitial to return a tuple (remember this explains an order-2 chain):

def getInitial(self):
    return ('', '')

(if you want to expand it further, you can use a really neat Python trick: tuple([''] * 2) will return the same thing. Instead of empty strings, you can use None)

We'll get to changing what uses genInitial in a bit.

Yield and iteration

A strong concept which doesn't exist in js (yet) but does exist in Python is the yield operator (see this question for great explanations).

Another feature of Python is its generic for loop. You can go over nearly anything quite easily, including generators (functions which use yield). Combining the two, and we can redefine breakText:

def breakText(self, txt):
    #our very own (ε,ε)
    prev = self.getInitial()

    for word in txt.split(self.separator):
        yield prev, word
        #will be explained in the next paragraph
        prev = (prev[1], word)

    #end-of-sentence, prev->ε
    yield prev, ''

The magic part above, prev = (prev[1], word) can be explained best by example:

>>> a = (0, 1)
>>> a
(0, 1)
>>> a = (a[1], 2)
>>> a
(1, 2)

That's how we advance through the word list. And now we move up to what uses breakText, to the redefinition of markov.learn:

def learn(self, txt):
    for part in self.breakText(txt):
        key = part[0]
        value = part[1]

        self.memory[key].append(value)

Because our memory is a defaultdict, we don't have to worry about the key not existing.

A pee break on the side of the road

OK, we have half of the chain implemented, time to see it in action! What we have so far:

from collections import defaultdict

class Markov:
    memory = defaultdict(list)
    separator = ' '

    def learn(self, txt):
        for part in self.breakText(txt):
            key = part[0]
            value = part[1]

            self.memory[key].append(value)

    def breakText(self, txt):
        #our very own (ε,ε)
        prev = self.getInitial()

        for word in txt.split(self.separator):
            yield prev, word
            prev = (prev[1], word)

        #end-of-sentence, prev->ε
        yield (prev, '')

    def getInitial(self):
        return ('', '')

(I changed the class name from markov to Markov because I cringe every time a class begins with a lowercase letter). I saved it as brain.py and loaded up Python.

>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.memory
defaultdict(<class 'list'>, {('had', 'a'): ['little'], ('Mary', 'had'): ['a'], ('', ''): ['Mary'], ('little', 'lamb'): [''], ('a', 'little'): ['lamb'], ('', 'Mary'): ['had']})

Success! Let's look at the result more carefully, to see that we got it right:

{ ('', ''): ['Mary'],
  ('', 'Mary'): ['had'],
  ('Mary', 'had'): ['a'],
  ('a', 'little'): ['lamb'],
  ('had', 'a'): ['little'],
  ('little', 'lamb'): ['']}

zips up Ready to drive on? We still have to use this chain!

Changing the step function

We've already met what we need to remake step. We have the defaultdict, so we can use random.choice right away, and I can cheat a bit because I know the order of the chain. We can also get rid of the recursion (with some sorrow), if we see it as a function which takes a single step through the chain (my bad in the original article - a badly named function).

def step(self, state):
    choice = random.choice(self.memory[state] or [''])

    if not choice:
        return None

    nextState = (state[1], choice)
    return choice, nextState

I regretfully added the or [''] because random.choice moans about empty lists. Finally, we move a larger portion of the logic to ask (the actual construction of the sentence):

def ask(self, seed=False):
    ret = []

    if not seed:
        seed = self.getInitial()

    while True:
        link = self.step(seed)

        if link is None:
            break

        ret.append(link[0])
        seed = link[1]

    return self.separator.join(ret)

Yes, a bit yucky. We could have given step a better name and made it a generator, but I'm late for a meeting with my pregnant wife who's about to give birth to a baby who left the stove on fire in my car that's being towed! I better hurry!

The grand finale

But first, a talk with bob:

from collections import defaultdict
import random

class Markov:
    memory = defaultdict(list)
    separator = ' '

    def learn(self, txt):
        for part in self.breakText(txt):
            key = part[0]
            value = part[1]

            self.memory[key].append(value)

    def ask(self, seed):
        ret = []

        if not seed:
            seed = self.getInitial()

        while True:
            link = self.step(seed)

            if link is None:
                break

            ret.append(link[0])
            seed = link[1]

        return self.separator.join(ret)

    def breakText(self, txt):
        #our very own (ε,ε)
        prev = self.getInitial()

        for word in txt.split(self.separator):
            yield prev, word
            prev = (prev[1], word)

        #end-of-sentence, prev->ε
        yield (prev, '')

    def step(self, state):
        choice = random.choice(self.memory[state] or [''])

        if not choice:
            return None

        nextState = (state[1], choice)
        return choice, nextState

    def getInitial(self):
        return ('', '')

And loading it up:

>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.ask()
'Mary had a little lamb'
>>> bob.learn('Mary had a giant crab')
>>> bob.ask(('Mary', 'had'))
'a giant crab'

There is, of course, room for improvement and expanding on the concept. But it wouldn't be any fun if if I just gave you the answer.

Hopefully this will still help after 4 months.