user7065877 user7065877 - 1 month ago 8
Python Question

In Python how can I generate a frequency word counter that aggreagtes to different levels?

I've seen other python word counters that read CSV files and give a word count for the entire column. I'd like to see the count of a word per row, except I'd like it on the "project" and "sub-project" level (other columns in my data). This way I could see if a sub-project had a higher word count than another for a specific word. I'd like to have the final columns be: Project, Sub-Project, Word, Word Count(per sub-project, not total). I'd appreciate any help!

Input:

Columns - Project/Sub-project/Corpus

Project1/Sub 1/The red car is the best car

Project1/Sub 2/The blue is better

Export doc should read:

Columns - Project/Sub-Project/Word/Frequency

Project1/Sub1/The/2

Project1/Sub2/The/1

Answer

This program might do what you want:

import csv
from collections import Counter

with open('in.csv') as in_file:
    in_file = csv.DictReader(in_file)

    with open('out.csv', 'w') as out_file:
        out_file = csv.DictWriter(
            out_file,
            ['Project', 'Sub-Project', 'Word', 'Frequency'])
        out_file.writeheader()

        for line in in_file:
            words = Counter(map(str.lower, line['Corpus'].split()))

            for word, freq in words.most_common():
                out_file.writerow({
                    'Project': line['Project'],
                    'Sub-Project': line['Sub-project'],
                    'Word': word,
                    'Frequency': freq})