İzzet KILIÇ İzzet KILIÇ - 3 months ago 6
Python Question

How to find values that don't belong to sample in Python?

I need to take a sample from a dataframe but also I need the values that don't belong to that sample. For examle:

data = [[1,2,3,55], [1,2,34,5], [13,2,3,5], [1,2,32,5], [1,2,22,5]]
df = DataFrame(data=data, index=[0, 0, 1, 1, 1], columns=['A', 'B', 'C', 'D'])


Output:

In[97]: df.sample(3)
Out[97]:

A B C D
1 1 2 32 5
0 1 2 3 55
1 13 2 3 5


How can I reach the rest of 2 samples? Is there any basic way to do that?

Answer

With duplicates index it is problematic, so need reset_index firstly, then use boolean indexing with eq or isin:

df = df.reset_index()
sam = df.sample(3)
print (sam)
   index  A  B   C   D
0      0  1  2   3  55
1      0  1  2  34   5
3      1  1  2  32   5

print ((df.eq(sam, 1)).all(1))
0     True
1     True
2    False
3     True
4    False
dtype: bool

print ((df.isin(sam)).all(1))
0     True
1     True
2    False
3     True
4    False
dtype: bool

print (df[~(df.isin(sam)).all(1)])
   index   A  B   C  D
2      1  13  2   3  5
4      1   1  2  22  5

Last reasign index back:

print (sam.set_index('index').rename_axis(None))
   A  B   C   D
0  1  2   3  55
0  1  2  34   5
1  1  2  32   5

print (df[~(df.isin(sam)).all(1)].set_index('index').rename_axis(None))
    A  B   C  D
1  13  2   3  5
1   1  2  22  5
Comments