skyork skyork - 5 months ago 99
Python Question

Reference values in the previous row with map or apply

Given a dataframe

df
, I would like to generate a new variable/column for each row based on the values in the previous row.
df
is sorted so that the order of the rows is meaningful.

Normally, we can use either
map
or
apply
, but it seems that neither of them allows the access to values in the previous row.

For example, given existing rows
a b c
, I want to generate a new column
d
, which is based on some calculation using the value of
c
in the previous row.

How should I do it in pandas?

Answer

If you just want to do a calculation based on the previous row, you can calculate and then shift:

In [2]: df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})

In [3]: df
Out[3]:
   a   b
0  0   0
1  1  10
2  2  20

# a calculation based on other column
In [4]: df['c'] = df['b'] + 1

# shift the column
In [5]: df['c'] = df['c'].shift()

In [6]: df
Out[6]:
   a   b   c
0  0   0 NaN
1  1  10   1
2  2  20  11

If you want to do a calculation based on multiple rows, you could look at the rolling_apply function (http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments and http://pandas.pydata.org/pandas-docs/stable/generated/pandas.rolling_apply.html#pandas.rolling_apply)