bhjghjh bhjghjh - 3 months ago 14
Python Question

Subtracting the rows of a column from the preceding rows in a python pandas dataframe

I have a .dat file which takes thousands of rows in a column (say, the column is time, t), now I want to find the interval between the rows in the column, that means subtracting the value of second row from first row, and so on.. (to find dt). Then I wish to make a new column with those interval values and plot it against the original column. If any other language other than python is helpful in this case, I appreciate their suggestion too.

I have written a pseudo python code for that:

import pandas as pd
import numpy as np
from sys import argv
from pylab import *

import csv

script, filename = argv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]

# write it as a new CSV file
with open("./flash.dat", "wb") as f:
writer = csv.writer(f)

columns_to_keep = ['#time']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)

df = pd.DataFrame({"#time"})
df["#time"] = df["#time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])

df["prev_time"] = [np.nan] + df.iloc[:-1]["#time"].tolist()
df["time_delta"] = df.time - df.prev_time

pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

dataframe.plot(x='#time', y='time_delta', style='r')

print dataframe


Updated my code, and i am also sharing the .dat file I am working on.


One easy way to perform an operation involving values from different rows is simply to copy the required values one the same row and then apply a simple row-wise operation.

For instance, in your example, we'd have a dataframe with one time column and some other data, like so:

import pandas as pd
import numpy as np 

df = pd.DataFrame({"time":  pd.date_range("24 sept 2016",  periods=5*24, freq="1h")})
df["time"] = df["time"]  + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])

enter image description here

If you want to compute the time delta from the previous (or next, or whatever) row, you can simply copy the value from it, and then perform the subtraction:

df["prev_time"] = [np.nan] + df.iloc[:-1]["time"].tolist()
df["time_delta"] = df.time - df.prev_time

enter image description here