I'm looking for a better way to write this. This works fine for my sample data set, but is pretty slow on a larger data set. Starting with a
import pandas as pd
df = pd.DataFrame(data = [['Customer0', 10], ['Customer0', 12], ['Customer1', 23]],
grouped = df.groupby(['Customer']).mean()
grouped['count'] = df.groupby(['Customer']).count()
values = grouped.values.tolist()
indexes = grouped.index.tolist()
for x in range(0,len(values)):
[['Customer0', 11, 2], ['Customer0', 23, 1]]
Can you try this one?
df.groupby('Customer').agg(['mean', 'count']).reset_index().values.tolist() Out: [['Customer0', 11, 2], ['Customer1', 23, 1]]
A small note: This can only improve your code significantly if the number of groups (
len(values)) is quite large because we are not looping here. If you have only a small number of groups, I guess the improvement would be 2x at most.