B_Furtado B_Furtado - 2 months ago 9
Python Question

Return name of column after comparing cummulative sum with random drawn number

I have a

DataFrame
in which the sum of the columns is
1
, like so:

Out[]:
cod_mun ws_1 1_3 4 5_7 8 9_10 11 12 13 14 15 nd
1100015 0.1379 0.273 0.2199 0.1816 0.0566 0.0447 0.0617 0.0015 0 0.0021 0.0074 0.0137
1100023 0.1132 0.2009 0.185 0.2161 0.1036 0.0521 0.0885 0.0044 0.0038 0.0061 0.0181 0.0082


I want to draw a random number

import random
prob = random.random()


And then I want to compare such number with the cummulative sum of the columns from left to right and then return the columns'
heading
.

For example, if
prob = 0.24
the threshold would reach 0.27 in the second column,
0.1379 + 0.273 > 0.24
Then I would need to return the name of the column.

It it possible to do that WITHOUT a using 15
elif
s?

Such that:

if prob < df.iloc[0]['ws_1']:
return 'ws_1'
elif prob < df.iloc[0]['ws_1'] + df.iloc[0]['1_3']
return '1_3'
elif ...

Answer

I think you can count DataFrame.cumsum, compare with prob and get first column with True value by idxmax:

df.set_index('cod_mun', inplace=True)

prob = 0.24 

print (df.cumsum(axis=1))
           ws_1     1_3       4     5_7       8    9_10      11      12  \
cod_mun                                                                   
1100015  0.1379  0.4109  0.6308  0.8124  0.8690  0.9137  0.9754  0.9769   
1100023  0.1132  0.3141  0.4991  0.7152  0.8188  0.8709  0.9594  0.9638   

             13      14      15      nd  
cod_mun                                  
1100015  0.9769  0.9790  0.9864  1.0001  
1100023  0.9676  0.9737  0.9918  1.0000  

print (df.cumsum(axis=1) > prob)
          ws_1   1_3     4   5_7     8  9_10    11    12    13    14    15  \
cod_mun                                                                      
1100015  False  True  True  True  True  True  True  True  True  True  True   
1100023  False  True  True  True  True  True  True  True  True  True  True   

           nd  
cod_mun        
1100015  True  
1100023  True

print ((df.cumsum(axis=1) > prob).idxmax(axis=1))
cod_mun
1100015    1_3
1100023    1_3
dtype: object