Shaun Overton Shaun Overton - 3 days ago 4
Python Question

pandas dividing a column by lagged values

I'm trying to divide a Pandas DataFrame column by a lagged value, which is 1 in this example.

Create the dataframe. This example only has 1 column, even though my real data has dozens

dTest = pd.DataFrame(data={'Open': [0.99355, 0.99398, 0.99534, 0.99419]})


When I try this vector division (I'm a Python newbie coming from R):

dTest.ix[range(1,4),'Open'] / dTest.ix[range(0,3),'Open']


I get this output:


NaN 1 1 NaN


But I'm expecting:


1.0004327915052085
1.0013682367854484
0.9988446159101413


There's clearly something that I don't understand about the data structure. I'm expecting 3 values but it's outputting 4. What am I missing?

Answer

What you tried failed because the sliced ranges of the indices only overlap on the middle 2 rows. You should use shift to shift the rows to achieve what you want:

In [166]:
dTest['Open'] / dTest['Open'].shift()

Out[166]:
0         NaN
1    1.000433
2    1.001368
3    0.998845
Name: Open, dtype: float64

you can also use div:

In [159]:
dTest['Open'].div(dTest['Open'].shift(), axis=0)

Out[159]:
0         NaN
1    1.000433
2    1.001368
3    0.998845
Name: Open, dtype: float64

You can see that the indices are different when you slice so when using / only the common indices are affected:

In [164]:
dTest.ix[range(0,3),'Open']

Out[164]:
0    0.99355
1    0.99398
2    0.99534
Name: Open, dtype: float64

In [165]:
dTest.ix[range(1,4),'Open']

Out[165]:
1    0.99398
2    0.99534
3    0.99419
Name: Open, dtype: float64

here:

In [168]: 

dTest.ix[range(0,3),'Open'].index.intersection(dTest.ix[range(1,4),'Open'].index

Out[168]:
Int64Index([1, 2], dtype='int64')
Comments