beta beta - 10 months ago 110
Python Question

Pandas dataframe apply refer to previous row to calculate difference

I have the following pandas dataframe containing 2 columns (simplified). The first column contains player names and the second column contains dates (


player date
A 2010-01-01
A 2010-01-09
A 2010-01-11
A 2010-01-15
B 2010-02-01
B 2010-02-10
B 2010-02-21
B 2010-02-23

I want to add a column diff which represents the time difference in days per player. The result should look like this:

player date diff
A 2010-01-01 0
A 2010-01-09 8
A 2010-01-11 2
A 2010-01-15 4
B 2010-02-01 0
B 2010-02-10 9
B 2010-02-21 11
B 2010-02-23 2

The first row has
for diff, because there is no earlier date. The second row shows
, because the difference between
is eight days.

The problem is not calculating the day-difference between two
objects. I am just not sure on how to add the new column. I know, that I have to make a
first (
) and then use
(or maybe
?). However, I am stuck, because for calculating the difference, I need to refer to the previous row in the apply-function, and I don't know how to do that, if possible at all.

Thank you very much.

After trying both proposed solutions below, I figured out that they did not work with my code. After much headache, I found out that my data had duplicate indices. So after I found out that I have duplicate indices, a simple
solved my issue and the proposed solutions worked. Since both solutions work, but I can only mark one as correct, I will choose the more concise/shorter solution. Thanks to both of you, though!

Answer Source

You can simply write:

df['difference'] = df.groupby('player')['date'].diff().fillna(0)

This gives the new timedelta column with the correct values:

  player       date  difference
0      A 2010-01-01      0 days
1      A 2010-01-09      8 days
2      A 2010-01-11      2 days
3      A 2010-01-15      4 days
4      B 2010-02-01      0 days
5      B 2010-02-10      9 days
6      B 2010-02-21     11 days
7      B 2010-02-23      2 days

(I've used the name "difference" instead of "diff" to distinguish the name from the method diff.)