rey rey - 1 month ago 12
Python Question

Complex pandas isin function

I have a dataframe:

In [47]: df
Out[47]:
uid a b
0 111 1 2
1 111 2 3
2 111 4 5
3 111 6 7
4 111 5 8
5 222 0 9
6 222 11 12
7 222 13 11
8 222 2 1
9 333 14 16
10 333 3 2
11 333 16 19
12 333 3 4
13 444 21 20
14 444 9 5
15 444 20 24
16 444 5 6


I want to check if values in
b
are present in
a
,vice-versa only if the
uid
is same.
I used
isin
:

df[(df.b.isin(df.a))|(df.a.isin(df.b))]


but this goes through all values and does not give me the desired output.

Desired output:

Out[49]:
uid a b
0 111 1 2
1 111 2 3
2 111 4 5
4 111 5 8
6 222 11 12
7 222 13 11
9 333 14 16
11 333 16 19
13 444 21 20
14 444 9 5
15 444 20 24
16 444 5 6

Answer

I think you need groupby and apply boolean indexing :

print (df.groupby('uid').apply(lambda x: x[(x.b.isin(x.a))|(x.a.isin(x.b))]))
        uid   a   b
uid                
111 0   111   1   2
    1   111   2   3
    2   111   4   5
    4   111   5   8
222 6   222  11  12
    7   222  13  11
333 9   333  14  16
    11  333  16  19
444 13  444  21  20
    14  444   9   5
    15  444  20  24
    16  444   5   6

print (df.groupby('uid')
         .apply(lambda x: x[(x.b.isin(x.a))|(x.a.isin(x.b))])
         .reset_index(drop=True))
    uid   a   b
0   111   1   2
1   111   2   3
2   111   4   5
3   111   5   8
4   222  11  12
5   222  13  11
6   333  14  16
7   333  16  19
8   444  21  20
9   444   9   5
10  444  20  24
11  444   5   6