RobHenst RobHenst - 5 months ago 11
Python Question

Python: How to "NaN" a range of data when a given criteria is met for a given period of time

I have a pd.dataframe which contains activity count data from a Philips Actiwatch. When there is no activity count for a period of more than 60 minutes, the user was probably not wearing the device, and this range should be removed.

How do I detect periods of >60 min (each line is 1 minute) in my Dataframe and remove that complete period. Thus, if the activity count is 0 for 59 lines or less, nothing happens, but if the activity count is 0 for 60 lines or more (let's say 80 lines), this data should be NaN.

The csv file with the data can be found here:
https://www.dropbox.com/s/6h43nrozohc9vd8/Actiwatch%20Data?dl=0

Pretty useless as it is, this is where I got stuck:

# remove all data where Activity = 0 for 60 or more consecutive minutes:

zero_count = 0
for n in range(len(data)):
if data['Activity'].loc[n] == NaN:
continue
elif data['Activity'].loc[n] > 0:
continue
elif data['Activity'].loc[n] = 0:
while data['Activity'].loc[n] = 0:
zero_count = zero_count + 1
if zero_count >60:
# NaN last zero_count number of lines.
zero_count = 0
break
else:
zero_count = 0
break
else:
print "Non-wear detection error"
break


What I was trying to do is check each line, if it is 0, it should add +1 to the "zero_count" and when a non-zero digit is read, it should check whether the zero_count is >60, if it is, it should NaN the whole range and reset the zero_count. If it is <60, the zero_count should just be reset without NaN-ing any data.

I hope anyone understand what I am trying to do and either: 1) make the code above work, or 2) have a better idea for doing what I am trying to do.

Thanks everyone who is even reading this post.

Best regards,

Rob

Answer

You were close, but you have an infinate loop in your code. as well as some of the logic in your if statement is a bit off. Here is a corrected solution.

streakCount = 0 # Counts the longest current streak
streakIndex = 0 # Keeps track of where the streak started
for n in range(len(data)):
    if data['Activity'].loc[n] = 0:
        if streakCount == 0:
            streakIndex = n
        streakCount += 1
    elif data['Activity'].loc[n] > 0:
        if streakCount > 60:
            for i in range(streakIndex,streamIndex+streakCount): # loop to Nan out the range of 0's
                data['Activity'].loc[i] = float('nan')
        streakCount = 0 # reset streak