Eran Moshe - 1 year ago 59
Python Question

# python: count events by certain id

I have the following array[][]:

``````import numpy as np

data = np.array([
[20,  0,  1],
[22,  0,  1],
[31,  0,  0],
[49,  1,  0],
[96,  1,  0],
[57,  2,  1],
[45,  3,  0],
[12,  3,  0],
[14,  3,  1],
[33,  4,  1],
[34,  4,  1],
[15,  4,  1]
])
``````

Lets call the columns:
`a, b, c`
by the above order where
`b is id`
.
I want to count the number of 1's in the
`c`
column by the id's in column
`b`

this will result the following 2 columns array (1st column is unique(b) and 2nd column is counted 1's from
`c`
per that id):

``````data = np.array([
[4,  3],
[0,  2],
[2,  1],
[3,  1],
[1,  0]
])
``````

You can also see it is sorted by the counted 1's in the
`c`
column

My idea to solve this problem was to create a dictionery:
`{ id1:counted 1's id2:counted 1's ... }`
by the ids in the b column and iterate over the array and count the number of 1's per id and put it as value per key in the dictionary, then create an array out of the result and sort it by the 2nd column.

Is there any pythonic easy and better way to do so ?

another case scenario is where i want to add all the integers in the c column per id, so for:

``````data = np.array([
[20,  0,  2],
[22,  0,  1],
[31,  0,  0],
[49,  1,  0],
[96,  1,  0],
[57,  2,  1],
[45,  3,  0],
[12,  3,  5],
[14,  3,  1],
[33,  4,  1],
[34,  4,  3],
[15,  4,  4]
])
``````

I will get

``````data = np.array([
[4,  8],
[3,  6],
[0,  3],
[2,  1],
[1,  0]
])
``````

You can use `np.bincount` -

``````count = np.bincount(data[:,1],data[:,2]==1)
out = np.column_stack((np.unique(data[:,1]),count))
``````

If you need it in descending order of count, we need to add two more lines of code -

``````sidx = count.argsort()[::-1]
out = np.column_stack((sidx,count[sidx]))
``````

Alternatively, if you need it in descending order of count and also keep the order, use `argsort` with `'mergesort'`, like so -

``````sidx = (-count).argsort(kind='mergesort')
out = np.column_stack((sidx,count[sidx]))
``````

Sample run -

Input array :

``````In [36]: data
Out[36]:
array([[20,  0,  1],
[22,  0,  1],
[31,  0,  0],
[49,  1,  0],
[96,  1,  0],
[57,  2,  1],
[45,  3,  0],
[12,  3,  0],
[14,  3,  1],
[33,  4,  1],
[34,  4,  1],
[15,  4,  1]])
``````

Part 1 :

``````In [37]: count = np.bincount(data[:,1],data[:,2]==1)
...: out = np.column_stack((np.unique(data[:,1]),count))
...:

In [38]: out
Out[38]:
array([[ 0.,  2.],
[ 1.,  0.],
[ 2.,  1.],
[ 3.,  1.],
[ 4.,  3.]])
``````

Part 2 :

``````In [39]: sidx = count.argsort()[::-1]
...: out = np.column_stack((sidx,count[sidx]))
...:

In [40]: out
Out[40]:
array([[ 4.,  3.],
[ 0.,  2.],
[ 3.,  1.],
[ 2.,  1.],
[ 1.,  0.]])
``````

Part 3 :

``````In [48]: sidx = (-count).argsort(kind='mergesort')

In [49]: np.column_stack((sidx,count[sidx]))
Out[49]:
array([[ 4.,  3.],
[ 0.,  2.],
[ 2.,  1.],
[ 3.,  1.],
[ 1.,  0.]])
``````

To add all the integers in the c column per id, simply skip that check against `1` -

``````count = np.bincount(data[:,1],data[:,2])
``````