dimyG - 1 year ago 70
Python Question

Vectorized calculation of a column's value based on a previous value of the same column?

I have a pandas dataframe with 2 columns like this:

``````df = pd.DataFrame(data={'A': [10, 2, 3, 4, 5, 6], 'B': [0, 1, 2, 3, 4, 5]})
>>> df
A  B
0  10  0
1   2  1
2   3  2
3   4  3
4   5  4
5   6  5
``````

I want to create a new column C in the following way:
C[i]=C[i-1]-A[i]+B[i]

In this question the answer proposes the use of a loop like this:

``````df['C'] = df['A']

for i in range(1, len(df)):
df['C'][i] = df['C'][i-1] - df['A'][i] + df['B'][i]

>>> df
A  B   C
0  10  0  10
1   2  1   9
2   3  2   8
3   4  3   7
4   5  4   6
5   6  5   5
``````

Which does the job.

But since the loops are generally slow in comparison to vectorized calculations, I was wondering if there is a vectorized solution for this in pandas. (And this is the reason for this new question).

I tried to use the shift method like this

``````df['C'] = df['C'].shift(1).fillna(df['A']) - df['A'] + df['B']
``````

but it didn't help since the shifted C column isn't updated with the calculation. It keeps its original values:

``````>>> df['C'].shift(1).fillna(df['A'])
0    10
1    10
2     2
3     3
4     4
5     5
``````

and that produces a wrong result.

The problem you have can be vectorized since delta[i] = C[i] - C[i-1] = -A[i]+B[i]. You can get delta from A and B first and calculate cumulative sum of delta (plus C[0]) to get full C, as follows:

``````In [21]:
delta = df['B'] - df['A']
delta[0] = 0
df['C'] = df.loc[0, 'A'] + delta.cumsum()
​
print df
A  B   C
0  10  0  10
1   2  1   9
2   3  2   8
3   4  3   7
4   5  4   6
5   6  5   5
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download