Justin Justin - 3 months ago 19
Python Question

Assigning indicators based on observation quantile

I am working with a pandas DataFrame. I would like to assign a column indicator variable to 1 when a particular condition is met. I compute quantiles for particular groups. If the value is outside the quantile, I want to assign the column indicator variable to 1. For example, the following code prints the quantiles for each group:

df[df['LENGTH'] > 1].groupby(['CLIMATE', 'TEMP'])['LENGTH'].quantile(.95)]

Now for all observations in my dataframe which are larger than the grouped value I would like to set

df['INDICATOR'] = 1

I tried using the following if statement:

if df.groupby(['CLIMATE','BIN'])['LENGTH'] > df[df['LENGTH'] > 1].groupby(['CLIMATE','BIN'])['LENGTH'].quantile(.95):
df['INDICATOR'] = 1

This gives me the error: "ValueError: operands could not be broadcast together with shapes (269,) (269,2)". Any help would be appreciated!


you want to use transform after your groupby to get an equivalently sized array. gt is greater than. mul is multiply. I multiply by 1 to get the boolean results from gt to 0 or 1.

You can see other examples here using transform to get group-level statistics while preserving the original dataframe

Consider the dataframe df

df = pd.DataFrame(dict(labels=np.random.choice(list('abcde'), 100),

I'd get the indicator like this

df.A.gt(df.groupby('labels').A.transform(pd.Series.quantile, q=.95)).mul(1)

In your case, I'd do

df['INDICATOR'] = df['LENGTH'].gt(df.groupby(['CLIMATE','BIN'])['LENGTH'] \
                                    .transform(pd.Series.quantile, q=.95)).mul(1)