Grr Grr - 6 months ago 30
Python Question

creating a dictionary of words in string whose values are words following that word

I would like to create a dictionary from a text file using each unique word as a key and a a dictionary of the words that follow the key with the count of that word as the value. For example something that looks like this:

>>>string = 'This is a string'
{'this': {'is': 1}, 'is': {'a': 1}, 'a': {'string': 1}}

Creating a dictionary of the unique words is no issue, it's creating the dictionary for the following word values I'm stuck on. I can't use an list.index() operation in case there are word repeats. Outside of that I am kind of at a loss.


Actually, the collections.Counter class isn't always the best choice to count something. You can use collections.defaultdict:

from collections import defaultdict

def bigrams(text):
    words = text.strip().lower().split()
    counter = defaultdict(lambda: defaultdict(int))
    for prev, current in zip(words[:-1], words[1:]):
        counter[prev][current] += 1
    return counter

Note that if your text contains punctuation marks as well, the line words = text.strip().lower().split() should be substituted with words = re.findall(r'\w+', text.lower()).