bananablue1 - 26 days ago 11

Python Question

I am trying to take the entropy of my k-means result dataframe and I am getting the error back: TypeError: 'numpy.int32' object is not iterable

I dont understand why.

`from collections import Counter`

def calcEntropy(x):

p, lens = Counter(x), np.float(len(x))

return -np.sum(count/lens*np.log2(count/lens) for count in p.values())

k_means_sp['entropy']=[calcEntropy(x) for x in k_means_sp['cluster']]

and then I get the error message:

`<ipython-input-26-d375ecf00330> in <module>()`

----> 1 k_means_sp['entropy']=[calcEntropy(x) for x in k_means_sp['cluster']]

<ipython-input-26-d375ecf00330> in <listcomp>(.0)

----> 1 k_means_sp['entropy']=[calcEntropy(x) for x in k_means_sp['cluster']]

<ipython-input-23-f5508ea8782c> in calcEntropy(x)

1 from collections import Counter

2 def calcEntropy(x):

----> 3 p, lens = Counter(x), np.float(len(x))

4 return -np.sum(count/lens*np.log2(count/lens) for count in p.values())

/Users/mpiercy/anaconda/lib/python3.6/collections/__init__.py in __init__(*args, **kwds)

535 raise TypeError('expected at most 1 arguments, got %d' % len(args))

536 super(Counter, self).__init__()

--> 537 self.update(*args, **kwds)

538

539 def __missing__(self, key):

/Users/mpiercy/anaconda/lib/python3.6/collections/__init__.py in update(*args, **kwds)

622 super(Counter, self).update(iterable) # fast path when counter is empty

623 else:

--> 624 _count_elements(self, iterable)

625 if kwds:

626 self.update(kwds)

TypeError: 'numpy.int32' object is not iterable

k_means_sp.head()

credit debit cluster

0 9.207673 8.198884 1

1 4.248495 8.202181 0

2 8.149668 7.735145 2

3 5.138677 7.859741 0

4 8.058163 7.918614 2

Answer Source

Ok this is a first attempt. It looks like your dataframe stores the cluster index in the `'cluster'`

column. So what you need to do is get each cluster based on the index, and then pass that cluster to your `calcEntropy`

function, something like

```
for i in xrange(len(k_means_sp['cluster'].unique())) # loop thru cluster indices:
cluster = k_means_sp.ix[k_means_sp['cluster'] == i][['credit', 'debit']]
entropy = calcEntropy(cluster)
```

The second line filters out the rows to only the ones that have the same cluster index. Does this help?