user5779223 - 4 months ago 90
Python Question

How to constuct a column of data frame recursively with pandas-python?

Give such a data frame

`df`
:

``````id_      val
11111    12
12003    22
88763    19
43721    77
...
``````

I wish to add a column
`diff`
to
`df`
, and each row of it equals to, let's say, the
`val`
in that row minus the
`diff`
in the previous row and multiply 0.4 and then add
`diff`
in the previous day:

``````diff = (val - diff_previousDay) * 0.4 + diff_previousDay
``````

And the
`diff`
in the first row equals to
`val * 4`
in that row. That is, the expected
`df`
should be:

``````id_      val     diff
11111    12      4.8
12003    22      11.68
88763    19      14.608
43721    77      ...
``````

And I have tried:

``````mul = 0.4
df['diff'] = df.apply(lambda row: (row['val'] - df.loc[row.name, 'diff']) * mul + df.loc[row.name, 'diff'] if int(row.name) > 0 else row['val'] * mul, axis=1)
``````

But got such as error:

TypeError: ("unsupported operand type(s) for -: 'float' and 'NoneType'", 'occurred at index 1')

Do you know how to solve this problem? Thank you in advance!

You can use:

``````df.ix[0, 'diff'] = df.ix[0, 'val'] * 0.4

for i in range(1, len(df)):
df.ix[i, 'diff'] = (df.ix[i, 'val'] - df.ix[i-1, 'diff']) * 0.4  + df.ix[i-1, 'diff']

print (df)
id_  val     diff
0  11111   12   4.8000
1  12003   22  11.6800
2  88763   19  14.6080
3  43721   77  39.5648
``````

The iterative nature of the calculation where the inputs depend on results of previous steps complicates vectorization. You could perhaps use apply with a function that does the same calculation as the loop, but behind the scenes this would also be a loop.