RockJake28 RockJake28 - 2 months ago 16
Python Question

Pandas random sample with remove

I'm aware of

DataFrame.sample()
, but how can I do this and also remove the sample from the dataset? (Note: AFAIK this has nothing to do with sampling with replacement)

For example here is the essence of what I want to achieve, this does not actually work:

len(df) # 1000

df_subset = df.sample(300)
len(df_subset) # 300

df = df.remove(df_subset)
len(df) # 700

Answer

If your index is unique

df = df.drop(df_subset.index)

example

df = pd.DataFrame(np.arange(10).reshape(-1, 2))

sample

df_subset = df.sample(2)
df_subset

enter image description here


drop

df.drop(df_subset.index)

enter image description here

Comments