pottolom pottolom - 10 days ago 4
Python Question

Python Pandas dataframe subtract cumulative column

I have some data that I am importing into a Pandas dataframe. This data is "cumulative" and indexed on a time series, see below:

Raw data
2016-11-23 10:00:00 48.6
2016-11-23 11:00:00 158.7
2016-11-23 12:00:00 377.8
2016-11-23 13:00:00 591.7
2016-11-23 14:00:00 748.5
2016-11-23 15:00:00 848.2


The data is updated daily, so the time series will move forward a day each day.

What I need to do is to take this dataframe and create a new column as shown below. The first row simply copies the data from the "Raw data" column. Then each subsequent row takes the data from the "Raw data" column, and subtracts the value that appeared before it, e.g. 158.7 - 48.6 = 110.1, 377.8 - 158.7 = 219.1, etc.

Does anyone know how I can achieve what is in the "Process data" column in Python/Pandas?

Raw data Processed data
23/11/2016 10:00 48.6 48.6
23/11/2016 11:00 158.7 110.1
23/11/2016 12:00 377.8 219.1
23/11/2016 13:00 591.7 213.9
23/11/2016 14:00 748.5 156.8
23/11/2016 15:00 848.2 99.7

Answer

You can use substract by sub with shifted column:

Last fill NaN by first value in Raw data.

df['Processed data'] = df['Raw data'].sub(df['Raw data'].shift())
df['Processed data'].iloc[0] = df['Raw data'].iloc[0]
print (df)
                     Raw data  Processed data
2016-11-23 10:00:00      48.6            48.6
2016-11-23 11:00:00     158.7           110.1
2016-11-23 12:00:00     377.8           219.1
2016-11-23 13:00:00     591.7           213.9
2016-11-23 14:00:00     748.5           156.8
2016-11-23 15:00:00     848.2            99.7
Comments