AyZhng AyZhng - 3 months ago 8
Python Question

Python - Display one line for each unique word

I'm trying to write a python code that counts the frequency of each word in a text file. The code should display one line per unique word. The code I wrote is displaying duplicate words.

import string

text = open('mary.txt','r')
textr = text.read()

for punc in string.punctuation:
textr = textr.replace(punc, "")

wordlist = textr.split()

for word in wordlist:
count = wordlist.count(word)
print word,':',count


My current output is...

are : 1
around : 1
as : 1
at : 2
at : 2
away : 1
back : 1
be : 2
be : 2
because : 1
below : 1
between : 1
both : 1
but : 1
by : 2
by : 2


The output show only display
at : 2
,
be : 2
, and
by : 2
once. What should I change in my code for that to happen?

Answer

The issue with your code is that you're creating a list of all the words and then looping over them. You want to create some sort of data structure that only stores unique words. A dict is a good way to do this, but it turns out there's a specialized collection in Python called a Counter that's built for exactly this purpose.

Give this a try (untested):

from collections import Counter
import string

text = open('mary.txt','r')
textr = text.read()

for punc in string.punctuation:
    textr = textr.replace(punc, "")

counts = Counter(textr.split())

for word, count in counts.items():
    print word,':',count