pavemann pavemann - 24 days ago 6
R Question

R - time series hourly

I have the following dataset of incoming calls per day within the hours from 3 p.m. to 10 p.m. which looks like this:

Date hour Count Year Month Day
01.01.2001 15 69 2001 1 1
01.01.2001 16 12 2001 1 1
01.01.2001 17 56 2001 1 1
01.01.2001 18 34 2001 1 1
01.01.2001 19 44 2001 1 1
01.01.2001 20 91 2001 1 1
01.01.2001 21 82 2001 1 1
01.01.2001 22 49 2001 1 1
...
17.08.2003 22 103 2003 8 17


what needs to be done is a time series analysis including forecasts, exponential smoothing, moving average and so forth.

the problem that I'm facing now is how to declare the ts function? I only have the peak hours from 3 p.m to 10 p.m available, so I can't declare the frequency as 24.

Can anybody help me out?

many thanks
cheers,

Answer

1) Assuming that the series starts at 3pm, that days are consecutive and all hours from 3pm to 10pm are present:

tser <- ts(DF[-1], freq = 8)

giving:

> tser
Time Series:
Start = c(1, 1) 
End = c(1, 8) 
Frequency = 8 
      hour Count Year Month Day
1.000   15    69 2001     1   1
1.125   16    12 2001     1   1
1.250   17    56 2001     1   1
1.375   18    34 2001     1   1
1.500   19    44 2001     1   1
1.625   20    91 2001     1   1
1.750   21    82 2001     1   1
1.875   22    49 2001     1   1

This will represent the index for day 1 3pm as 1.0, day 1 4pm as 1+1/8, day 1 5pm as 1+2/8, ..., day1 10pm as 1+7/8, day 2 3pm as 2, day 2 4pm as 2+1/8, etc.

2) This is the same but the days start at the number of days since 1970-01-01 instead of starting at 1:

tser <- ts(DF[-1], start = as.Date("2001-01-01"), freq = 8)

giving:

> tser
Time Series:
Start = c(11323, 1) 
End = c(11323, 8) 
Frequency = 8 
         hour Count Year Month Day
11323.00   15    69 2001     1   1
11323.12   16    12 2001     1   1
11323.25   17    56 2001     1   1
11323.38   18    34 2001     1   1
11323.50   19    44 2001     1   1
11323.62   20    91 2001     1   1
11323.75   21    82 2001     1   1
11323.88   22    49 2001     1   1

That is, this would represent each day as the number of days since 1970-01-01 plus, as before, 0, 1/8, ..., 7/8 for the hours.

If you later need to regenerate the date/time then:

library(chron)
tt <- as.numeric(time(tser))
as.chron(tt %/% 1) + (8 * tt%%1 + 15)/24

giving:

[1] (01/01/01 15:00:00) (01/01/01 16:00:00) (01/01/01 17:00:00)
[4] (01/01/01 18:00:00) (01/01/01 19:00:00) (01/01/01 20:00:00)
[7] (01/01/01 21:00:00) (01/01/01 22:00:00)

3) zoo If its not important to keep them equally spaced then you could try this:

library(zoo)
library(chron)
z <- zoo(DF[-1], as.chron(format(DF$Date), "%d.%m.%Y") + DF$hour/24)

giving:

> z
                    hour Count Year Month Day
(01/01/01 15:00:00)   15    69 2001     1   1
(01/01/01 16:00:00)   16    12 2001     1   1
(01/01/01 17:00:00)   17    56 2001     1   1
(01/01/01 18:00:00)   18    34 2001     1   1
(01/01/01 19:00:00)   19    44 2001     1   1
(01/01/01 20:00:00)   20    91 2001     1   1
(01/01/01 21:00:00)   21    82 2001     1   1
(01/01/01 22:00:00)   22    49 2001     1   1

The zoo approach does not require that all hours be present nor is it required that the days be consecutive.

Note: I am not sure that you really need all the date and hour fields broken out separately since they can easily be generated on the fly so this might be enough.

Count <- z$Count

Year can be recovered via as.numeric(format(time(Count), "%Y")) and month, day and hour can be recovered by using %m, %d or %H in place of %Y.

A list of the month, day and year columns can also be generated using month.day.year(time(Count)).

years(time(Count)), months(time(Count)), days(time(Count)) and hours(time(Count)) will produce factors of the indicated quantities.