piRSquared piRSquared - 2 months ago 17
Python Question

Must produce aggregated value. I swear that I am

consider the

pd.Series
s


a = np.arange(4)
mux = pd.MultiIndex.from_product([list('ab'), list('xy')])
s = pd.Series([a] * 4, mux)
print(s)

a x [0, 1, 2, 3]
y [0, 1, 2, 3]
b x [0, 1, 2, 3]
y [0, 1, 2, 3]
dtype: object





problem

each element of
s
is a
numpy.array
. when I try to sum within groups, I get an error because the groupby function expects the result to be scalar... (I'm guessing)

s.groupby(level=0).sum()



Exception Traceback (most recent call last)
<ipython-input-627-c5b3bf6890ea> in <module>()
----> 1 s.groupby(level=0).sum()

C:\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in f(self)
101 raise SpecificationError(str(e))
102 except Exception:
--> 103 result = self.aggregate(lambda x: npfunc(x, axis=self.axis))
104 if _convert:
105 result = result._convert(datetime=True)

C:\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in aggregate(self, func_or_funcs, *args, **kwargs)
2584 return self._python_agg_general(func_or_funcs, *args, **kwargs)
2585 except Exception:
-> 2586 result = self._aggregate_named(func_or_funcs, *args, **kwargs)
2587
2588 index = Index(sorted(result), name=self.grouper.names[0])

C:\Anaconda2\lib\site-packages\pandas\core\groupby.pyc in _aggregate_named(self, func, *args, **kwargs)
2704 output = func(group, *args, **kwargs)
2705 if isinstance(output, (Series, Index, np.ndarray)):
-> 2706 raise Exception('Must produce aggregated value')
2707 result[name] = self._try_cast(output, group)
2708

Exception: Must produce aggregated value






work around

when I use
apply
with
np.sum
, it works fine.

s.groupby(level=0).apply(np.sum)

a [0, 2, 4, 6]
b [0, 2, 4, 6]
dtype: object





question

is there an elegant way to handle this?




real problem

I actually want to use
agg
in this way

s.groupby(level=0).agg(['sum', 'prod'])


but it fails in the same way.

the only way to get this is to

pd.concat([g.apply(np.sum), g.apply(np.prod)],
axis=1, keys=['sum', 'prod'])


enter image description here

but this doesn't generalize well to longer lists of transforms.

Answer

from this well explained answer you could transform your ndarray to list because pandas seems to be checking if the output is a ndarray and this is why you are getting this error raised :

s.groupby(level=0).agg({"sum": lambda x: list(x.sum()), "prod":lambda x: list(x.prod())})

Out[249]:

            sum          prod
a  [0, 2, 4, 6]  [0, 1, 4, 9]
b  [0, 2, 4, 6]  [0, 1, 4, 9]