NinaJ NinaJ -4 years ago 43
Python Question

word count in all files using for loop

I want to get word frequency per file in all files in a folder.
However, it did not work.

The error was as follows:



C:\Python\Anaconda3\python.exe C:/Python/Anaconda3/frequency.py
Traceback (most recent call last):
File "C:/Python/Anaconda3/frequency.py", line 6, in
for word in file.read().split():
NameError: name 'file' is not defined

Process finished with exit code 1



How can I make it effectively?
Thank you.

import glob
import os
path = 'C:\Python\Anaconda3'
for filename in glob.glob(os.path.join(path, '*.txt')):
wordcount = {}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
print(word, wordcount)

Answer Source

As the code stands, you have three obvious errors (although there may be more).

  1. You have a for loop where you change the name of the iterator

    for **filename** in glob.glob(os.path.join(path, '*.txt')):
        ...
        for word in **file**.read.split():
            ...
    
  2. The wordcount dictionary gets re-initialized (and thus erased) in each iteration of your for loop. You can fix this two ways depending on what you are trying to get at:

    a. Move the line wordcount={} to before you start your for loops to prevent clearing out the dictionary after each file. This will give you a total wordcount for all files.

    b. Append wordcount to another dictionary files after each iteration of your loop, that way you have a dictionary where the keys are filenames, and the values are dictionaries containing your wordcounts. This can be a bit confusing, because you now have a dictionary of dictionaries. Referencing individual wordcounts becomes filecounts[filename][word] = count.

  3. Your method of printing dictionaries is incorrect, consider the following instead:

    for word in wordcount:
        print('{word}:\t{count}'.format(word=word, count=wordcount[word]))
    

I would also suggest using a default dictionary (see Docs, this would eliminate the need to check if a word is in the dictionary, and set it to 1.

So, in total, I would write it:

from collections import defaultdict
import glob
import os

path = 'C:\Python\Anaconda3'
filecounts = {}

for filename in glob.glob(os.path.join(path, '*.txt')):
    wordcount = defaultdict(int)
    for word in filename.read().split():
        wordcount[word] += 1

    filecounts[filename] = wordcount

for filename in filecounts:
    print('Word count for file \'{file}\''.format(file=filename))
    for word in filecounts[filename]:
        print('\t{word}:\t{count}'.format(word=word, count=filecounts[filename][word]))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download