Madhumitha - 6 months ago 35

R Question

Raw DataI am trying to forecast the leakage rates through seals. Raw data consists of measurements of data which was recorded every second for 60 minutes.

I have 3600 data points(60seconds*60minutes=3600).I am trying to convert my data into time series using ts in r.I wrote the following code, where I set my frequency as 60, because we collected 60 data points per second. Frequency is the number of observations per cycle and in my case the cycle is minutes (I assumed, not sure if that is right)

`NEW <- ts(Set2.1, start= 0, end= 60, frequency=60)`

Is this the right way to do it? because instead of 3600 data points, I get 3601 data point.Not sure why I get 3601 datapoints instead of 3600. If i don't mention frequency in my code, it gives me exactly 3600 datapoints.

`NEW <- ts(Set2.1)`

and when I decompose using decompose, I get the following error: time series has no or less than 2 periods.Is it possible for a data to have no seasonality? My rawdata is very linear, which trend upwards and not sure, if it has any sesonality in it. I am new to timeseries, please help me proceed with this. Thanks

Answer

`ts()`

works with two time units. The longer one of the two contains a number of samples of the shorter one, which is given by the `frequency`

argument to `ts()`

.

In your case, the longer time unit are minutes, which contain 60 samples of the shorter one, i.e. seconds. Any moment in the time series can be labeled by two numbers, indicating the two units. Again, in your case, the measurement at 10 minutes, 35 seconds would be labeled as `c(10, 35)`

.

When you indicate start and end inside `ts()`

, you can also give them with two numbers. Your first measurement is at 0 minutes 1 second, that is `c(0, 1)`

and the last one is at 60 minutes 0 seconds, that is `c(60, 0)`

. So, this gives the expected result (I create some dummy values for the example):

```
data <- rnorm(3600)
ts_data <- ts(data, start = c(0, 1), end = c(60, 0), frequency = 60)
length(ts_data)
## [1] 3600
```

In order to explain the issue with your approach, I need to introduce two other functions: `start()`

and `end()`

return the two value labels for the first and last measurement, respectively. So, for the example above:

```
start(ts_data)
## [1] 0 1
end(ts_data)
## [1] 59 60
```

Here, you can already see a little detail: `ts()`

does actually start to number the samples at 1, such that 60 minutes and 0 seconds is understood as 59 minutes 60 seconds.

Now to your example:

```
ts_data2 <- ts(data, start = 0, end = 60, frequency = 60)
start(ts_data2)
## [1] 0 1
end(ts_data2)
## [1] 60 1
```

As you can see, if you pass only a single integer to `start`

and `end`

, this is interpreted as belonging to the first sample in the shorter unit. So, you actually created a time series that runs from 0 minutes 1 seconds to 60 minutes one seconds, which is one second more than you wanted. The length of `ts_data2`

is therefore `3601`

.

A little remark to finish: I mostly talked about seconds an minutes because this fits your example. But it is important to note that `ts()`

is just working with a longer and a shorter time unit, without knowing what they actually are. So, `c(4, 6)`

could not only refer to 4 minutes and 6 seconds, but just as well to the 6th month in the 4th year, or day six in week 4.