Justin - 6 months ago 35

Python Question

I am working with a pandas DataFrame. I would like to assign a column indicator variable to 1 when a particular condition is met. I compute quantiles for particular groups. If the value is outside the quantile, I want to assign the column indicator variable to 1. For example, the following code prints the quantiles for each group:

`df[df['LENGTH'] > 1].groupby(['CLIMATE', 'TEMP'])['LENGTH'].quantile(.95)]`

Now for all observations in my dataframe which are larger than the grouped value I would like to set

`df['INDICATOR'] = 1`

I tried using the following if statement:

`if df.groupby(['CLIMATE','BIN'])['LENGTH'] > df[df['LENGTH'] > 1].groupby(['CLIMATE','BIN'])['LENGTH'].quantile(.95):`

df['INDICATOR'] = 1

This gives me the error: "ValueError: operands could not be broadcast together with shapes (269,) (269,2)". Any help would be appreciated!

Answer

you want to use `transform`

after your `groupby`

to get an equivalently sized array. `gt`

is greater than. `mul`

is multiply. I multiply by `1`

to get the boolean results from `gt`

to `0`

or `1`

.

You can see other examples here using transform to get group-level statistics while preserving the original dataframe

Consider the dataframe `df`

```
df = pd.DataFrame(dict(labels=np.random.choice(list('abcde'), 100),
A=np.random.randn(100)))
```

I'd get the indicator like this

```
df.A.gt(df.groupby('labels').A.transform(pd.Series.quantile, q=.95)).mul(1)
```

In your case, I'd do

```
df['INDICATOR'] = df['LENGTH'].gt(df.groupby(['CLIMATE','BIN'])['LENGTH'] \
.transform(pd.Series.quantile, q=.95)).mul(1)
```