beta - 1 year ago 173

Python Question

I have the following pandas dataframe containing 2 columns (simplified). The first column contains *player names* and the second column contains *dates* (

`datetime`

`player date`

A 2010-01-01

A 2010-01-09

A 2010-01-11

A 2010-01-15

B 2010-02-01

B 2010-02-10

B 2010-02-21

B 2010-02-23

I want to add a column

`player date diff`

A 2010-01-01 0

A 2010-01-09 8

A 2010-01-11 2

A 2010-01-15 4

B 2010-02-01 0

B 2010-02-10 9

B 2010-02-21 11

B 2010-02-23 2

The first row has

`0`

`8`

`2010-01-01`

`2010-01-09`

The problem is not calculating the day-difference between two

`datetime`

`groupby`

`df.groupby('player')`

`apply`

`transform`

Thank you very much.

After trying both proposed solutions below, I figured out that they did not work with my code. After much headache, I found out that my data had duplicate indices. So after I found out that I have duplicate indices, a simple

`df.reset_index()`

Answer Source

You can simply write:

```
df['difference'] = df.groupby('player')['date'].diff().fillna(0)
```

This gives the new timedelta column with the correct values:

```
player date difference
0 A 2010-01-01 0 days
1 A 2010-01-09 8 days
2 A 2010-01-11 2 days
3 A 2010-01-15 4 days
4 B 2010-02-01 0 days
5 B 2010-02-10 9 days
6 B 2010-02-21 11 days
7 B 2010-02-23 2 days
```

(I've used the name "difference" instead of "diff" to distinguish the name from the method `diff`

.)