Faranak Faranak - 1 month ago 7
Python Question

join/search/sum in Pandas Python

i'm new to Panda and trying to learn it, I have a DataFrame in Panda with 3 different columns:

a b c
-----------------------------
' Alice 5/5/2014 2 '
' Bob 7/18/2014 1 '
' Alice 5/5/2014 3 '
' Bob 8/10/2014 5 '
------------------------------


I want to sum up the 'C' columns for each person per month, so the desired result would be like :

a b c
-----------------------------
' Alice 5/5/2014 5 '
' Bob 7/18/2014 1 '
' Bob 8/10/2014 5 '
------------------------------


what is the best way to do this in Panda.

if my question is repeated please re-direct me to other question i couln't find it maybe because i wasn't sure what to look for. thank you

Answer

The most efficient way is to first make sure your date column is of a datetime type:

>>> df2
       a          b  c
0  Alice   5/5/2014  2
1    Bob  7/18/2014  1
2  Alice   5/9/2014  3
3    Bob  8/10/2014  5
>>> df2['b'] = pd.to_datetime(df2.b)

Then, index the DataFrame by the date column:

>>> df2.set_index('b',inplace=True)
>>> df2
                a  c
b                   
2014-05-05  Alice  2
2014-07-18    Bob  1
2014-05-09  Alice  3
2014-08-10    Bob  5

Then use groupby:

>>> df2.groupby(['a',df2.index.month]).sum()
         c
a         
Alice 5  5
Bob   7  1
      8  5
>>> 

And you can always go back to your original index:

>>> df2.reset_index(inplace=True)
>>> df2
           b      a  c
0 2014-05-05  Alice  2
1 2014-07-18    Bob  1
2 2014-05-09  Alice  3
3 2014-08-10    Bob  5
Comments