andrey - 1 year ago 74

R Question

Working in R. I would like to forecast time series of prevalences using the initial values and a set of transition parameters. For the data of the following structure

`cohort <- c(1980,1981,1982)`

A00 <- c(.15, .2,.4)

B00 <- c(.25, .3, .4)

C00 <-c(.6, .5,.2)

Tab<-c(.6,.5,.4)

Tac<-c(.2,.25,.35)

ds <- data.frame(cohort,A00,B00,C00,Tab,Tac)

print (ds)

cohort A00 B00 C00 Tab Tac

1 1980 0.15 0.25 0.6 0.6 0.20

2 1981 0.20 0.30 0.5 0.5 0.25

3 1982 0.40 0.40 0.2 0.4 0.35

Initial values in columns A00, B00, and C00 represent relevant size of each group (A,B,C) at time t=00. They add up to 1 across the row (A00+B00+C00=1). Parameters Tab and Tac are used to predict the prevalence at time t+1 using some mathematical model, for example

`A01 = df$A00 -df$Tab +df$Tac.`

The function to compute predicted values at time t+1 is

`forecast<- function( df ) {`

dsResult <- data.frame(

cohort= df$cohort,

A01 = df$A00 -df$Tab +df$Tac ,

B01 = df$B00 -df$Tab +df$Tac,

C01 = df$C00 -df$Tab +df$Tac

)

dsResult<- merge(df,dsResult,by="cohort")

return( dsResult)

}

new<-forecast(ds)

and produces the following result

`cohort A00 B00 C00 Tab Tac A01 B01 C01`

1 1980 0.15 0.25 0.6 0.6 0.20 -0.25 -0.15 0.20

2 1981 0.20 0.30 0.5 0.5 0.25 -0.05 0.05 0.25

3 1982 0.40 0.40 0.2 0.4 0.35 0.35 0.35 0.15

I would very much appreciate your help in learning how to write a loop to cycle through a desired number of years of the forecast( for t in 1:7, for instance). Thanks in advance!

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Initially I'd like to make two suggestions that might make the problem easier to code. First, revise the data schema so that each year is a unique row, and each group is a unique column. Second, since the cohorts are treated mathematically independent of each other, keep them separate for now, at least until the code's kernel is built. Put a loop around this later that cycles through them. In the first block of code, there are two matrices, one with observed data, and one that will collect the predicted data.

```
yearCount <- 7 #Declare the number of time points.
groupCount <- 3 #Declare the number of groups.
#Create fake data that sum to 1 across rows/times.
ob <- matrix(runif(yearCount*groupCount), ncol=groupCount)
ob <- ob / apply(ob, 1, function( x ){ return( sum(x) )})
#Establish a container to old the predicted values.
pred <- matrix(NA_real_, ncol=groupCount, nrow=yearCount)
t12<-.5; t13<-.2; t11<-1-t12-t13 #Transition parameters from group 1
t21<-.2; t23<-.4; t22<-1-t21-t23 #Transition parameters from group 2
t31<-.3; t32<-.1; t33<-1-t31-t32 #Transition parameters from group 3
for( i in 2:yearCount ) {
pred[i, 1] <- ob[i-1, 1]*t11 + ob[i-1, 2]*t21 + ob[i-1, 3]*t31
pred[i, 2] <- ob[i-1, 1]*t12 + ob[i-1, 2]*t22 + ob[i-1, 3]*t32
pred[i, 3] <- ob[i-1, 1]*t13 + ob[i-1, 2]*t23 + ob[i-1, 3]*t33
}
#Calculate the squared errors
ss <- (pred[-1, ] - ob[-1, ])^2 #Ignore the first year of data
```

Inside the loop, you probably notice the familiar structure of matrix multiplication. Each row can be slightly condensed using inner products (ie, one row of the `ob`

matrix is multiplied, then summed with a one "column" of the `t`

s. I'm using `t12`

slightly differently than the `Tab`

in your post; this is the probability of transitioning from group 1 to group 2 at a given time point.

```
#Create transition parameters that sum to 1 across rows/groups.
tt <- matrix(runif(groupCount*groupCount), ncol=groupCount)
tt <- tt / apply(tt, 1, function( x ){ return( sum(x) )})
```

Pretend the `tt`

matrix was defined earlier, instead of the separate variables of `t11`

,...,`t33`

.

```
for( i in 2:yearCount ) {
pred[i, 1] <- ob[i-1, ] %*% tt[, 1]
pred[i, 2] <- ob[i-1, ] %*% tt[, 2]
pred[i, 3] <- ob[i-1, ] %*% tt[, 3]
}
```

The loop's contents are slightly cleaner than when each element pair was explicitly multiplied and summed. But we don't have to treat each row/column pair individually. All three columns of the `ob`

matrix can be operated on by all three columns of the `tt`

matrix simultaneously:

```
for( i in 2:yearCount ) {
pred[i, ] <- ob[i-1, ] %*% tt
}
```

This should be much quicker than even the previous version, because R's internal memory system isn't recreating the matrix three times for each row -only once per row. To reduce this to once per matrix, use the `apply`

function, and then transpose the matrix if that suits your purpose. Finally, notice that the rows represent different years than `pred`

(ie, row i-1 here is the same as row i in `pred`

).

```
predictionWIthExtraYear <- t(apply(ob, 1, FUN=function(row){row %*% tt}))
```

To accommodate cohorts, perhaps you could declare a list with three elements (for the 1980, 1981, and 1982 cohorts). Each element would be a unique `ob`

matrix. And create a second list for a unique `pred`

matrix. Or maybe use three dimensional matrices (but that may be more taxing when R recreates the memory with the replacement function).

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**