HimanUCC - 2 months ago 6x
Python Question

# Find the index of certain values in a data frame and put it as a separate column

In the following data frame DF, Users have different values for Movies and Exist columns. For example, user 2 has 10 values and User 5 has 9 values. I want the position of the first 'True' value for Exist column (relative to the user vector length) divided to the user vector length to be put in a separate data frame along with the User ID: Imagine this is the data frame:

``````    User    Movie       Exist
0   2       172         False
1   2       2717        False
2   2       150         False
3   2       2700        False
4   2       2699        True
5   2       2616        False
6   2       112         False
7   2       2571        True
8   2       2657        True
9   2       2561        False
10  5       3471        False
11  5       187         False
12  5       2985        False
13  5       3388        False
14  5       3418        False
15  5       32          False
16  5       1673        False
17  5       3740        True
18  5       1693        False
``````

So the target data frame should look like this:

``````5/10 =0.5
8/9= 0.88

User  Location
2      0.5
5      0.88
``````

As the first True value for user 2 is in the relative index 5 (5th value in user 2 vector) and the first True value for user 5 is in index 8 (8th value in the user 5 vector). Note that, I don't want the real index which are 4 and 17.

Option 1

``````def first_ratio(x):
x = x.reset_index(drop=True)
i = x.any() * (x.idxmax() + 1.)
l = len(x)
return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()

User
2    0.500000
5    0.888889
Name: Exist, dtype: float64
``````

Option 2

``````def first_ratio(x):
v = x.values
i = v.any() * (v.argmax() + 1.)
l = v.shape[0]
return i / l

df.groupby('User').Exist.apply(first_ratio).rename('Location').to_frame()
``````