Khris Khris - 2 months ago 11x
Python Question

How can I sum collection.Counter objects in a Python Pandas Dataframe using the update() method?

I'm dealing with semi-structured data that doesn't fully fit into a pandas dataframe, so I have some columns containing collections.Counter objects (i.e. dictionaries) of vastly varying lengths.

I need to apply a groupby on another column and need to sum up these Counters, however without dropping zeros or ignoring negative values. That means I can not use the sum() method on these columns.

The method of choice would be the update() method, however it can't be simply applied like the sum() method as it needs an argument which would be another Counter which however sits in another row and not another column.


import pandas as pd
import collections as cc

A = [cc.Counter({'A': 1,'B':-1,'C': 1}),\
cc.Counter({'A':-1,'B': 1, 'D': 0,'E': 1}),\
cc.Counter({'A': 0, 'E': 0,'F': 1}),\
cc.Counter({ 'B': 0,'C':-1, 'E':-1,'F':-1})]

B = ['N','N','N','N']

S1 = pd.Series(B,index=['W','X','Y','Z'],name='K',dtype=str)
S2 = pd.Series(A,index=['W','X','Y','Z'],name='L',dtype=dict)
F = pd.merge(S1.to_frame(),S2.to_frame(),left_index=True,right_index=True)
print F

This leads to the output

W N {u'A': 1, u'C': 1, u'B': -1}
X N {u'A': -1, u'B': 1, u'E': 1, u'D': 0}
Y N {u'A': 0, u'E': 0, u'F': 1}
Z N {u'C': -1, u'B': 0, u'E': -1, u'F': -1}

Doing this:

G = F.groupby('K')
print G.sum()

Leads to this output:

N {}

But what I want is this:

Counter({'A': 0, 'C': 0, 'B': 0, 'E': 0, 'D': 0, 'F': 0})

which can be manually done with the update method like this:

for i in range(1,4):
print A[0]

So I either need a technique to apply update() to a groupby object either by creating an appropriate function or by changing the grouped rows into columns (something that seems rather inefficient and time-consuming to do), or I will have to restructure my data in a way that omits zeros and negative values in the Counters.

Any ideas are welcome.

I still fail to apply the proposed solution to the grouped DataFrame in my example:

G.apply(lambda x: pd.DataFrame(x).sum().to_dict())

gives the result:

N {u'K': u'NNNN', u'L': {}}
dtype: object

The problem is that I don't quite understand how apply on groupby objects works.

Like when I'm doing this:

F.groupby('K').apply(lambda x: list(x))

The result is:

N [K, L]
dtype: object

And I don't understand why and how.


After @piRSquared answers helped me to solve the problem I'm adding the full solution to not only get the dictionary but to get the dictionary back into a DataFrame as well:

pd.DataFrame.from_dict([to_dict_dropna(pd.concat([F.K, F.L.apply(pd.Series)], axis=1)\

The function to_dict_dropna() is taken from "make pandas DataFrame to a dict and dropna" and neccessary if there keys without values in the summed dictionaries.
I'm transposing the frame and resetting the index because I need the initial index as a column. Then I merge this with other frames to get the final format I need.


consider the list of dicts A

A = [{'A': 1,'B':-1,'C': 1},
     {'A':-1,'B': 1,       'D': 0,'E': 1},
     {'A': 0,                     'E': 0,'F': 1},
     {       'B': 0,'C':-1,       'E':-1,'F':-1}]


{'A{'A': 0.0, 'B': 0.0, 'C': 0.0, 'D': 0.0, 'E': 0.0, 'F': 0.0}

I'll keep that original answer alone. But it was based on my incorrect assumption that you wanted the last value. The answer then evolved when I realized sum is what you needed.

Given that, this is a better solution


To apply this directly to the dataframe F you've defined:

pd.concat([F.K, F.L.apply(pd.Series)], axis=1).groupby('K').sum()

enter image description here