bananablue1 bananablue1 - 26 days ago 11
Python Question

Python TypeError: 'numpy.int32' object is not iterable

I am trying to take the entropy of my k-means result dataframe and I am getting the error back: TypeError: 'numpy.int32' object is not iterable
I dont understand why.

from collections import Counter
def calcEntropy(x):
p, lens = Counter(x), np.float(len(x))
return -np.sum(count/lens*np.log2(count/lens) for count in p.values())
k_means_sp['entropy']=[calcEntropy(x) for x in k_means_sp['cluster']]


and then I get the error message:

<ipython-input-26-d375ecf00330> in <module>()
----> 1 k_means_sp['entropy']=[calcEntropy(x) for x in k_means_sp['cluster']]

<ipython-input-26-d375ecf00330> in <listcomp>(.0)
----> 1 k_means_sp['entropy']=[calcEntropy(x) for x in k_means_sp['cluster']]

<ipython-input-23-f5508ea8782c> in calcEntropy(x)
1 from collections import Counter
2 def calcEntropy(x):
----> 3 p, lens = Counter(x), np.float(len(x))
4 return -np.sum(count/lens*np.log2(count/lens) for count in p.values())

/Users/mpiercy/anaconda/lib/python3.6/collections/__init__.py in __init__(*args, **kwds)
535 raise TypeError('expected at most 1 arguments, got %d' % len(args))
536 super(Counter, self).__init__()
--> 537 self.update(*args, **kwds)
538
539 def __missing__(self, key):

/Users/mpiercy/anaconda/lib/python3.6/collections/__init__.py in update(*args, **kwds)
622 super(Counter, self).update(iterable) # fast path when counter is empty
623 else:
--> 624 _count_elements(self, iterable)
625 if kwds:
626 self.update(kwds)

TypeError: 'numpy.int32' object is not iterable

k_means_sp.head()

credit debit cluster
0 9.207673 8.198884 1
1 4.248495 8.202181 0
2 8.149668 7.735145 2
3 5.138677 7.859741 0
4 8.058163 7.918614 2

Answer Source

Ok this is a first attempt. It looks like your dataframe stores the cluster index in the 'cluster' column. So what you need to do is get each cluster based on the index, and then pass that cluster to your calcEntropy function, something like

for i in xrange(len(k_means_sp['cluster'].unique())) # loop thru cluster indices:
    cluster = k_means_sp.ix[k_means_sp['cluster'] == i][['credit', 'debit']]
    entropy = calcEntropy(cluster)

The second line filters out the rows to only the ones that have the same cluster index. Does this help?