Stivi B - 1 month ago 7
Python Question

# Determine mean value of ‘data’ where the highest number of CONTINUOUS cond=True

I have a pandas Dataframe with a 'data' and 'cond'(-ition) column. I need the mean value (of the data column) of the rows with the highest number of CONTINUOUS True objects in 'cond'.

``````    Example DataFrame:

cond  data
0   True  0.20
1  False  0.30
2   True  0.90
3   True  1.20
4   True  2.30
5  False  0.75
6   True  0.80

Result = 1.466, which is the mean value of row-indexes 2:4 with 3 True
``````

I was not able to find a „vectorized“ solution with a groupby or pivot method. So I wrote a func that loops the rows. Unfortunately this takes about an hour for 1 Million lines, which is way to long. Unfortunately, the @jit decoration does not reduce the duration measurably.

The data I want to analyze is from a monitoring project over one year and I have every 3 hours a DataFrame with one Million rows. Thus, about 3000 such files.

An efficient solution would be very important. I am also very grateful for a solution in numpy.

Using the approach from Calculating the number of specific consecutive equal values in a vectorized way in pandas:

``````df['data'].groupby((df['cond'] != df['cond'].shift()).cumsum()).agg(['count', 'mean'])[lambda x: x['count']==x['count'].max()]
Out:
count      mean
cond
3         3  1.466667
``````

Indexing by a callable requires 0.18.0, for earlier versions, you can do:

``````res = df['data'].groupby((df['cond'] != df['cond'].shift()).cumsum()).agg(['count', 'mean'])

res[res['count'] == res['count'].max()]
Out:
count      mean
cond
3         3  1.466667
``````