Sancta Ignis Sancta Ignis - 1 month ago 11
Python Question

Python CSV: Replace cell value (float) of row with new value (float)

updated Sample CSV Data:

c1,c2,v1,v2,p1,p2,r1,a1,f1,f2,f3,Time_Stamp

0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:00
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:01
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:02
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:03
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:04
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:05
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:06
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:07
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:08
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:09
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:10
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:11
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:12
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:13
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:14
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:15
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:17
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:18
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:19
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:20
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:21
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:22
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:23
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:24
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:25
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:26
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:27
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:28
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:29
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:30
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:31
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:32
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:33
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:34
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:35
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:36


Edited- Reading of CSV, Python code:

import numpy as np
from datetime import date,time,datetime
import pandas as pd

def readcsv(x): #def function to read csv files based on input below
Data = pd.read_csv(x, parse_dates=['Time_Stamp'], infer_datetime_format=True)
Data['Date'] = Data.Time_Stamp.dt.date #creating Date Column in the Data Frame ( does not affect the main .csv file)
Data['Time'] = Data.Time_Stamp.dt.time #creating Time Column in the Data Frame ( also does not affect the main .csv file)
#print (Data) #<-- prints the output
#print (Data['Time_Stamp'][6000:7000]) <- print from row 6000 to 7000 of the data frame (has over 150'000 rows)
Data['Time_Stamp'] = pd.to_datetime(Data['Time_Stamp']) # Time_Stamp Data Frame
print(Data[1:6])
return Data

Data = readcsv('data.csv')


#Data = csv file data

def getMask(start,end,Data):
mask = (Data['Time_Stamp'] > start) & (Data['Time_Stamp'] <= end)
return mask;

start = '2017-06-13 16:00:00'
end = '2017-06-13 16:40:00'
timerange = Data.loc[getMask(start, end, Data)]

pspike = timerange.loc[timerange['c1'] <= 5.0]


pspike output:

the row : for e.g. -> After printing
pspike
, the printed row has
time
value of
16:38:15
and the next printed row has
time
value of
16:38:17
, that means it skipped a row where the time value is
16:38:16


[ e.g. below ]

13/06/2017 16:38:12
13/06/2017 16:38:13
13/06/2017 16:38:14
13/06/2017 16:38:15
13/06/2017 16:38:17
13/06/2017 16:38:18


After running the code below, it prints out the row( only the Time_Stamp value ) that was skipped, which has
time
value of
16:38:16
,
16:38:22
and
16:38:32
where the rows with that value were skipped based on the output of
pspike


for i in range(timerange.shape[0] - 1):
row1 = timerange.iloc[i]
row2 = timerange.iloc[i+1]
if (row2[-1] - row1[-1]).seconds > 1:
print (row1[-1] + pd.Timedelta('1s'))


Output:

2017-06-13 16:38:16
2017-06-13 16:38:22
2017-06-13 16:38:32


What I'm trying to get is to print the whole row where the
Time_Stamp
value is
2017-06-13 16:38:16
, where its the only row with
c1
value that is more than 5.0, in this case (based on the sample code), its
415.7


Instead of this:

13/06/2017 16:38:16


I want to print it like this:

415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:22
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:32


After printing that row out, I have to replace the value of
c1
-
415.7
directly above, with
0.0
. How do I do this?

EDIT

What to replace:

The
c1
of the missing row in pspike output

Answer Source

I'm a bit confused here because you could simply do:

pspike = timerange[timerange['c1'].gt(5.0)] #gr=greater than, lt=lower than

Which returns a dataframe with:

16  415.7   12.5    30.2    154.6   4675.2  1   -1  5199.4  0   50  0   2017-06-13 16:38:16

Set values of column "c1" to 0.0

pspike["c1"] = 0.0

Create a string from first row (index=0):

','.join(pspike.astype(str).values.tolist()[0])

Prints:

'0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:16'

Update

string = """c1,c2,v1,v2,p1,p2,r1,a1,f1,f2,f3,Time_Stamp
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:00
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:01
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:02
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:03
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:04
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:05
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:06
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:07
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:08
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:09
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:10
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:11
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:12
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:13
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:14
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:15
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:17
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:18
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:19
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:20
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:21
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:22
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:23
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:24
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:25
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:26
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:27
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:28
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:29
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:30
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:31
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:32
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:33
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:34
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:35
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:36"""

df = pd.read_csv(io.StringIO(string)) # reads data from string above
#Use : df = pd.read_csv('filename.csv')  for csv file (handling tons of data)
df["Time_Stamp"] = pd.to_datetime(df["Time_Stamp"]) # convert to Datetime
df_filter = df[df["c1"].le(0.5)] # new df with less or equal to 0.5
where = (df_filter[df_filter["Time_Stamp"].diff().dt.total_seconds() > 1] ["Time_Stamp"] - pd.Timedelta("1s")).astype(str).tolist() # Find where diff > 1 second
df_filter2 = df[df["Time_Stamp"].isin(where)] # Create new df with those
df_filter2["c1"] = 0.0 # Set c1 to 0.0

for index, row in df_filter2.iterrows():
    values = row.astype(str).tolist()
    print(','.join(values))

Return

0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:16
0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:22
0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:32