tmasters - 1 year ago 75

Python Question

I have two pandas dataframes in a panel and would like to create a third df that ranks the first df (by row) but only include those where the corresponding element of the second df is True. Some sample data to illustrate:

`p['x']`

A B C D E

2015-12-31 0.957941 -0.686432 1.087717 1.363008 -1.528369

2016-01-31 0.079616 0.524744 1.675234 0.665511 0.023160

2016-02-29 -0.300144 -0.705346 -0.141015 1.341883 0.855853

2016-03-31 0.435728 1.046326 -0.422501 0.536986 -0.656256

p['y']

A B C D E

2015-12-31 True False True False NaN

2016-01-31 True True True False NaN

2016-02-29 False True True True NaN

2016-03-31 NaN NaN NaN NaN NaN

I have managed to do this with a few ugly hacks but still get stuck on the fact that rank won't let me use method='first' on non-numeric data. I want to force incremental integer ranks (even if duplicates) and NaN for any cell that didn't have True in the boolean df.

Output should be of the form:

`A B C D E`

2015-12-31 2.0 NaN 1.0 NaN NaN

2016-01-31 3.0 2.0 1.0 NaN NaN

2016-02-29 NaN 3.0 2.0 1.0 NaN

2016-03-31 NaN NaN NaN NaN NaN

My hacked attempt is below. It works, although there should clearly be a better way to replace false with NaN. However it doesn't work once I add method='first' and this is necessary as I may have instances of duplicated values.

`# I first had to hack a replacement of False with NaN.`

# np.nan did not evaluate correctly

# I wasn't sure how else to specify pandas NaN

rank=p['Z'].replace(False,p['Z'].iloc[3,0])

# eliminate the elements without a corresponding True

rank=rank*p['X']

# then this works

p['rank'] = rank.rank(axis=1, ascending=False)

# but this doesn't

p['rank'] = rank.rank(axis=1, ascending=False, method='first')

Any help would be much appreciated!

thanks

- List item

Answer Source