Afflatus Afflatus - 5 months ago 111
Python Question

Python: Pandas: Rolling Windows - mean() works but variance() doesn't?

I have the following data which is recorded in seconds: http://pastebin.com/wBSJWYn2

I want to capture various summery statistics like the mean, variance, etc on it for 1 minute intervals. So I'm running these functions on

sensor_data.rolling(window=1,freq="1MIN")
. For the most part it works fine, but there are two types of irregularities I can't overcome for certain types of functions. Specifically, either:


  1. No output for incomplete minutes -- It doesn't given an output for minutes that don't have all 60 seconds. This is the case for the
    mean(), quantile(), sum()

  2. No output at all. For certain functions like
    var(), std(), kurt(), skew()
    , I don't get any values at all. I really can't understand why this would be the case given that it was able to calculate the mean...



Other functions seem to work without a problem:
max(), median(), min()


I really care about the 2nd problem, but it would be a bonus to get a workaround for the 1st as well...




sensor_data.head()

x_acceleration y_acceleration z_acceleration heart_rate electrodermal_activity temperature
index
2016-05-16 06:58:44 -33.25000 -43.03125 33.09375 NaN 0.297099 33.33
2016-05-16 06:58:45 -28.15625 -52.90625 24.12500 NaN 0.219612 33.33
2016-05-16 06:58:46 -25.87500 -55.96875 21.18750 NaN 0.222648 33.33
2016-05-16 06:58:47 -24.00000 -57.46875 19.40625 NaN 0.217335 33.33
2016-05-16 06:58:48 -22.84375 -56.25000 23.40625 NaN 0.214300 33.33


Example output of the 1st case -- no output for incomplete minute:

sensor_data.rolling(window=1,freq="1MIN").mean().head()
x_acceleration y_acceleration z_acceleration heart_rate electrodermal_activity temperature
index
2016-05-16 06:58:00 NaN NaN NaN NaN NaN NaN
2016-05-16 06:59:00 -24.84375 -59.46875 9.03125 68.57 0.208988 33.75
2016-05-16 07:00:00 6.31250 -62.78125 6.46875 79.40 0.224924 33.84
2016-05-16 07:01:00 -21.18750 -57.00000 22.50000 92.00 0.224165 34.13
2016-05-16 07:02:00 -17.46875 -58.87500 21.84375 81.10 0.224165 34.25


Example output of the 2nd case -- no output:

sensor_data.rolling(window=1,freq="1MIN").var().head()

x_acceleration y_acceleration z_acceleration heart_rate electrodermal_activity temperature
index
2016-05-16 06:58:00 NaN NaN NaN NaN NaN NaN
2016-05-16 06:59:00 NaN NaN NaN NaN NaN NaN
2016-05-16 07:00:00 NaN NaN NaN NaN NaN NaN
2016-05-16 07:01:00 NaN NaN NaN NaN NaN NaN
2016-05-16 07:02:00 NaN NaN NaN NaN NaN NaN

Answer

for starters, this will get you going.

sensor_data.groupby(pd.Grouper(level=0, freq='Min')).describe()

you can build a custom function:

def stats(df):
    kurt = pd.DataFrame(df.kurt(), columns=['kurt']).T
    skew = pd.DataFrame(df.skew(), columns=['skew']).T
    var = pd.DataFrame(df.var(), columns=['var']).T
    return pd.concat([df.describe(), var, skew, kurt])

then:

sensor_data.groupby(pd.Grouper(level=0, freq='Min')).apply(stats)

enter image description here

Comments