Philipp Philipp - 6 months ago 15
Python Question

Count how often a specific string occurs in a list

I want to pairwise compare several lists in a kind of "bag of words" approach. I have only strings in my lists.

Unfortunatelly, I have a bug in my script that I cannot fix.

The code works if there are numbers in the lists but as soon as I have strings in the lists it doesn't run anymore. I appreciate your help.

I receive following error message:

Traceback (most recent call last):
File "", line 21, in <module>
bow_matrix[0, p] = list_words_ab[p]
ValueError: could not convert string to float: 'd'

My code:

a = ["a", "b", "c", "d"]
b = ["b", "c", "d", "e"]

p = 0
if len(a) > len(b):
max_words = len(a)
max_words = len(b)
list_words_ab = list(set(a) | set(b))
len_bow_matrix = len(list_words_ab)
bow_matrix = numpy.zeros(shape = (3, len_bow_matrix))

while p < len_bow_matrix:
bow_matrix[0, p] = list_words_ab[p]
p = p+1
p = 0
while p < len_bow_matrix:
bow_matrix[1, p] = a.count(bow_matrix[0, p])
bow_matrix[2, p] = b.count(bow_matrix[0, p])
p = p+1


By default numpy.zeros makes an empty array of floats, to use strings you need to specify dtype=str:

bow_matrix = numpy.zeros(shape = (3, len_bow_matrix),dtype=str)