dv2 - 1 year ago 128

Python Question

Simple problem, but I cannot seem to get it to work. I want to calculate the percentage a number occurs in a list of arrays and output this percentage accordingly.

I have a list of arrays which looks like this:

`import numpy as np`

# Create some data

listvalues = []

arr1 = np.array([0, 0, 2])

arr2 = np.array([1, 1, 2, 2])

arr3 = np.array([0, 2, 2])

listvalues.append(arr1)

listvalues.append(arr2)

listvalues.append(arr3)

listvalues

>[array([0, 0, 2]), array([1, 1, 2, 2]), array([0, 2, 2])]

Now I count the occurrences using collections, which returns a a list of collections.Counter:

`import collections`

counter = []

for i in xrange(len(listvalues)):

counter.append(collections.Counter(listvalues[i]))

counter

>[Counter({0: 2, 2: 1}), Counter({1: 2, 2: 2}), Counter({0: 1, 2: 2})]

The result I am looking for is an array with 3 columns, representing the value 0 to 2 and len(listvalues) of rows. Each cell should be filled with the percentage of that value occurring in the array:

`# Result`

66.66 0 33.33

0 50 50

33.33 0 66.66

So 0 occurs 66.66% in array 1, 0% in array 2 and 33.33% in array 3, and so on..

What would be the best way to achieve this?

Many thanks!

Answer Source

Here's an approach -

```
# Get lengths of each element in input list
lens = np.array([len(item) for item in listvalues])
# Form group ID array to ID elements in flattened listvalues
shifts_arr = np.zeros(lens.sum(),int)
shifts_arr[lens[:-1].cumsum()] = 1
ID_arr = shifts_arr.cumsum()
# Extract all values & considering each row as an indexing perform counting
vals = np.concatenate(listvalues)
out_shp = [ID_arr.max()+1,vals.max()+1]
counts = np.bincount(ID_arr*out_shp[1] + vals)
# Finally get the percentages with dividing by group counts
out = 100*np.true_divide(counts.reshape(out_shp),np.bincount(ID_arr)[:,None])
```