Fabio Lamanna Fabio Lamanna - 7 months ago 12
Python Question

pandas - groupby and select variable amount of random values according to column

Starting from this simple dataframe

df
:

df = pd.DataFrame({'c':[1,1,2,2,2,2,3,3,3], 'n':[1,2,3,4,5,6,7,8,9], 'N':[1,1,2,2,2,2,2,2,2]})


I'm trying to select
N
random value from
n
for each
c
. So far I managed to groupby and get one single element / group with:

sample = df.groupby('c').apply(lambda x :x.iloc[np.random.randint(0, len(x))])


that returns:

N c n
c
1 1 1 2
2 2 2 4
3 2 3 8


My expected output would be something like:

N c n
c
1 1 1 2
2 2 2 4
2 2 2 3
3 2 3 8
3 2 3 7


so getting 1 sample from c=1 and 2 samples for c=2 and c=3, according to the
N
column.

Answer

Pandas objects now have a .sample method to return a random number of rows:

>>> df.groupby('c').apply(lambda g: g.n.sample(g.N.iloc[0]))
c   
1  1    2
2  5    6
   2    3
3  6    7
   7    8
Name: n, dtype: int64
Comments