Elysire - 16 days ago 6
Python Question

# Python Pandas conditionnal transform

My datas look like this

``````Plate LogRatio
1      0.4
1      0.3
1      0.2
2      0.5
2      0.3
``````

I would like to do 3 types of mean of LogRatio by plate : I don't want to count extremes LogRatio in my mean (quantiles exclusions).

``````m1 = mean(LogRatio[LogRatio > q20LogRatio & LogRatio < q80LogRatio])
m2 = mean(LogRatio[LogRatio > q25LogRatio & LogRatio < q75LogRatio])
m3 = mean(LogRatio[LogRatio > q30LogRatio & LogRatio < q70LogRatio])
M = (m1 + m2 + m3) / 3
``````

I tried something like this :

``````df['m1'] = df.groupby('Plate')['LogRatio'].transform(lambda x: ((x > x.quantile(q=0.20)) & (x < x.quantile(q=0.80))).mean())
df['m2'] = df.groupby('Plate')['LogRatio'].transform(lambda x: ((x > x.quantile(q=0.25)) & (x < x.quantile(q=0.75))).mean())
df['m3'] = df.groupby('Plate')['LogRatio'].transform(lambda x: ((x > x.quantile(q=0.30)) & (x < x.quantile(q=0.70))).mean())
df['M'] = (df['m1'] + df['m2'] + df['m3']) / 3
``````

But the results for each mean are not the same if I do it by hand on Calc.
Is it my logical conditions in the transform function that can't be understand like that ? I know the results are wrong but there isn't "Error" in my terminal so I don't know what to do.

AFAIU I would change the lambda function as follows:

``````df.groupby('Plate')['LogRatio'].transform(lambda s: s.loc[[True if v < s.quantile(q=0.8) and v > s.quantile(q=0.2) else False for v in s]].mean())
``````

This the `s.loc[]` accepts an interable with booleans in order to subset the LogRatio-`Series`

In order to make it more readable, I'd go for the following solution:

``````def quartile_subset(logratios,lower,upper):
# some comment to describe what you are doing
return logratios.loc[[True if v < logratios.quantile(q=upper) and v > logratios.quantile(q=lower) else False for v in logratios]]

df.groupby('Plate')['LogRatio'].transform(lambda s: quartile_subset(s,0.2,0.8).mean())
``````
Source (Stackoverflow)