fccoelho - 1 year ago 241

Python Question

Hi I have a time series and would like to count how many events I have per day(i.e. rows in the table within a day). The command I'd like to use is:

`ts.resample('D', how='count')`

but "count" is not a valid aggregation function for time series, I suppose.

just to clarify, here is a sample of the dataframe:

`0 2008-02-22 03:43:00`

1 2008-02-22 03:43:00

2 2010-08-05 06:48:00

3 2006-02-07 06:40:00

4 2005-06-06 05:04:00

5 2008-04-17 02:11:00

6 2012-05-12 06:46:00

7 2004-05-17 08:42:00

8 2004-08-02 05:02:00

9 2008-03-26 03:53:00

Name: Data_Hora, dtype: datetime64[ns]

and this is the error I am getting:

`ts.resample('D').count()`

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-42-86643e21ce18> in <module>()

----> 1 ts.resample('D').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)

255 def resample(self, rule, how=None, axis=0, fill_method=None,

256 closed=None, label=None, convention='start',

--> 257 kind=None, loffset=None, limit=None, base=0):

258 """

259 Convenience method for frequency conversion and resampling of regular

/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.pyc in resample(self, obj)

98 return obj

99 else: # pragma: no cover

--> 100 raise TypeError('Only valid with DatetimeIndex or PeriodIndex')

101

102 rs_axis = rs._get_axis(self.axis)

TypeError: Only valid with DatetimeIndex or PeriodIndex

That can be fixed by turning the datetime column into an index with set_index. However after I do that, I still get the following error:

`DataError: No numeric types to aggregate`

because my Dataframe does not have a numeric column.

But I just want to count rows!! The simple "select count(*) group by ... " from SQL.

Answer

In order to get this to work, after removing the rows in which the index was NaT:

```
df2 = df[df.index!=pd.NaT]
```

I had to add a column of ones:

```
df2['n'] = 1
```

and then count only that column:

```
df2.n.resample('D', how="sum")
```

then I could visualize the data with:

```
plot(df2.n.resample('D', how="sum"))
```

Source (Stackoverflow)