Demetri P - 10 months ago 58

Python Question

Suppose I have some observations, each with an indicated class from

`1`

`n`

How can I equally sample from the dataframe? Right now I do something like...

`frames = []`

classes = df.classes.unique()

for i in classes:

g = df[df.classes = i].sample(sample_size)

frames.append(g)

equally_sampled = pd.concat(frames)

Is there a pandas function to equally sample?

Answer Source

For more elegance you can do this:

```
df.groupby('classes').apply(lambda x: x.sample(sample_size))
```

You can make the `sample_size`

a function of group size to sample with equal probabilities (or proportionately):

```
nrows = len(df)
total_sample_size = 1e4
df.groupby('classes').\
apply(lambda x: x.sample(int((x.count()/nrows)*total_sample_size)))
```

It won't result in the exact number of rows as `total_sample_size`

but sampling will be more proportional than the naive method.