J_Heads J_Heads - 2 months ago 9
Python Question

Best approach to create time difference variable by id

I am working with a pandas df that looks like this:

ID time
34 43
2 99
2 20
34 8
2 90


What would be the best approach to a create variable that represents the difference from the most recent time per ID?

ID time diff
34 43 35
2 99 9
2 20 NA
34 8 NA
2 90 70

Answer

Here's one possibility

df["diff"] = df.sort_values("time").groupby("ID")["time"].diff()
df

    ID  time diff
0   34  43   35.0
1   2   99   9.0
2   2   20   NaN
3   34  8    NaN
4   2   90   70.0
Comments