Brian Brian - 14 days ago 5
Python Question

efficient way to union non-set iterables within groups

I have this

df


df = pd.DataFrame(dict(
A=['b', 'a', 'b', 'c', 'a', 'c', 'a', 'c', 'a', 'a'],
B=[[0, 2, 3, 1],
[9, 6, 7, 2],
[6, 0, 1, 4],
[9, 2, 5, 1],
[5, 1, 4, 8],
[8, 5, 6, 6],
[0, 9, 0, 0],
[2, 6, 1, 8],
[7, 3, 2, 6],
[8, 7, 1, 9]]
))


I want to group by
'A'
and union all the lists in
'B'


Neither
df.groupby('A').B.union()
nor
df.groupby('A').B.apply(set.union)
work.

I want the result to be

A
a {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b {0, 1, 2, 3, 4, 6}
c {1, 2, 5, 6, 8, 9}
Name: B, dtype: object

Answer

The problem is that you need to cast them as sets first before applying the union. One solution would be to use sum to concatenate the groups, then cast to set using map

In [28]: df.groupby('A').B.sum().map(set).map(set.union)
Out[28]:
A
a    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b                {0, 1, 2, 3, 4, 6}
c                {1, 2, 5, 6, 8, 9}
dtype: object