ryuzakinho ryuzakinho - 3 years ago 155
Python Question

How to perform bincount on an array of strings?

I have NumPy array containing string values.

For instance: ["bus", "bar", "bar", "café" .....]

What is the best way of counting the number of occurrences of each element in my array. My current solution is:

# my_list contains my data.
bincount = []
for name in set(my_list.tolist()):
count = sum([1 for elt in my_list if elt == name])
bincount.append(count)


I have tried bincount but it does not work with this type of data.

Do you know a better solution?

Answer Source

Option 1
np.unique

a, b = np.unique(l, return_inverse=True)

a
array(['bar', 'bus', 'café'],
      dtype='<U4')

b
array([1, 0, 0, 2, 1, 0, 2])

np.bincount(b)
array([3, 2, 2])

If you have pandas, this should be simple(r):

Option 2
pd.get_dummies

l = ['bus', 'bar', 'bar', 'café', 'bus', 'bar', 'café']

pd.get_dummies(l)

   bar  bus  café
0    0    1     0
1    1    0     0
2    1    0     0
3    0    0     1
4    0    1     0
5    1    0     0
6    0    0     1

pd.get_dummies.sum()

bar     3
bus     2
café    2
dtype: int64

Just call .values to get those counts.


Option 3
pd.factorize

np.bincount(pd.factorize(l)[0])
array([2, 3, 2])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download