RSHAP RSHAP - 6 months ago 16
Python Question

Pandas Groupby back to DataFrame

I have a dataframe:

df = pd.DataFrame({'Section': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6],
'Unit': np.arange(0,15)/100.0,
'Entries': [11, 22, 23, 1, 4, 8,
99, 112, 235, 22, 126,
442, 45, 56, 10],
'Exits': np.random.randint(0,100,15)},
columns = ['Section', 'Unit', 'Entries', 'Exits'])


I want to change the entries data for EACH section to the difference between values.

For example, the Entries for Section 1 are 11,22,23. I want them to be 0, 11, 1 (the difference between each value).

I can do
df.groupby('Section').Entries.apply(diff)
but this gets rid of the starting value and leaves me with a series that I don't know how to get back to the dataframe.

How might one do this?

Answer

is that what you want?

In [93]: df['diff'] = df.groupby('Section')['Entries'].diff().fillna(0)

In [94]: df
Out[94]:
    Section  Unit  Entries  Exits   diff
0         1  0.00       11     97    0.0
1         1  0.01       22     89   11.0
2         1  0.02       23     98    1.0
3         2  0.03        1     39    0.0
4         2  0.04        4     42    3.0
5         2  0.05        8     35    4.0
6         3  0.06       99     59    0.0
7         3  0.07      112     16   13.0
8         3  0.08      235      1  123.0
9         4  0.09       22     73    0.0
10        4  0.10      126     97  104.0
11        4  0.11      442     56  316.0
12        5  0.12       45     78    0.0
13        5  0.13       56     42   11.0
14        6  0.14       10     30    0.0