HimanUCC HimanUCC - 3 months ago 10
Python Question

Find the index of certain values in a data frame and put it as a separate column

In the following data frame DF, Users have different values for Movies and Exist columns. For example, user 2 has 10 values and User 5 has 9 values. I want the position of the first 'True' value for Exist column (relative to the user vector length) divided to the user vector length to be put in a separate data frame along with the User ID: Imagine this is the data frame:

User Movie Exist
0 2 172 False
1 2 2717 False
2 2 150 False
3 2 2700 False
4 2 2699 True
5 2 2616 False
6 2 112 False
7 2 2571 True
8 2 2657 True
9 2 2561 False
10 5 3471 False
11 5 187 False
12 5 2985 False
13 5 3388 False
14 5 3418 False
15 5 32 False
16 5 1673 False
17 5 3740 True
18 5 1693 False


So the target data frame should look like this:

5/10 =0.5
8/9= 0.88


User Location
2 0.5
5 0.88


As the first True value for user 2 is in the relative index 5 (5th value in user 2 vector) and the first True value for user 5 is in index 8 (8th value in the user 5 vector). Note that, I don't want the real index which are 4 and 17.

Answer

Option 1

def first_ratio(x):
    x = x.reset_index(drop=True)
    i = x.any() * (x.idxmax() + 1.)
    l = len(x)
    return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

User
2    0.500000
5    0.888889
Name: Exist, dtype: float64

Option 2

def first_ratio(x):
    v = x.values
    i = v.any() * (v.argmax() + 1.)
    l = v.shape[0]
    return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

Timing

enter image description here