user5779223 user5779223 - 4 months ago 90
Python Question

How to constuct a column of data frame recursively with pandas-python?

Give such a data frame

df
:

id_ val
11111 12
12003 22
88763 19
43721 77
...


I wish to add a column
diff
to
df
, and each row of it equals to, let's say, the
val
in that row minus the
diff
in the previous row and multiply 0.4 and then add
diff
in the previous day:

diff = (val - diff_previousDay) * 0.4 + diff_previousDay


And the
diff
in the first row equals to
val * 4
in that row. That is, the expected
df
should be:

id_ val diff
11111 12 4.8
12003 22 11.68
88763 19 14.608
43721 77 ...


And I have tried:

mul = 0.4
df['diff'] = df.apply(lambda row: (row['val'] - df.loc[row.name, 'diff']) * mul + df.loc[row.name, 'diff'] if int(row.name) > 0 else row['val'] * mul, axis=1)


But got such as error:


TypeError: ("unsupported operand type(s) for -: 'float' and 'NoneType'", 'occurred at index 1')


Do you know how to solve this problem? Thank you in advance!

Answer

You can use:

df.ix[0, 'diff'] = df.ix[0, 'val'] * 0.4

for i in range(1, len(df)):
    df.ix[i, 'diff'] = (df.ix[i, 'val'] - df.ix[i-1, 'diff']) * 0.4  + df.ix[i-1, 'diff']

print (df)
     id_  val     diff
0  11111   12   4.8000
1  12003   22  11.6800
2  88763   19  14.6080
3  43721   77  39.5648

The iterative nature of the calculation where the inputs depend on results of previous steps complicates vectorization. You could perhaps use apply with a function that does the same calculation as the loop, but behind the scenes this would also be a loop.

Comments