Mat Mat - 1 year ago 64
Python Question

Counting word occurrences in csv and determine row appearances

I have a csv file such as the following in one column. The symbols and numbers are only to show that the file does not just contain text. I have two objectives:

  1. count the number of occurrences of a word;

  2. determine how many rows a word appears in.

I like apples. Sally likes apples.
Jim has 4 berries. [email protected]#
John has 2 apples.

Ideally, the code should return something like:
{apples: 3}
{# of rows: 2}

I've written some code to try and count occurrences, but it isn't running properly (assumedly because of the punctuation). Also, I do not know how to determine the number of rows a word appears in; this could be as simple as counting the number of unique occurrences in each row, but I'm unsure of how to proceed. Here is the code I have so far, done in Python 3.6.1:

import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
for record in my_reader:
if record[0] == 'apples':
ctr += 1

The code merely returns
as the answer. Help?

Answer Source

You are comparing if the row == 'apple, what you need is if 'apple' in row. And to count the occurrences you can use str.count(), for example:

import csv
my_reader = csv.reader(open('file.csv', encoding = 'utf-8'))
ctr = 0
rows = 0
for record in my_reader:
    if 'apples' in record[0]:
        rows += 1
        ctr += record[0].count('apples')

print('apples: {}, rows: {}'.format(ctr, rows))

This way you will check if the row contains apples then you increment rows by one and increment ctr by number of apples in that row.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download