andrey - 14 days ago 5

R Question

Working in R. I would like to forecast time series of prevalences using the initial values and a set of transition parameters. For the data of the following structure

`cohort <- c(1980,1981,1982)`

A00 <- c(.15, .2,.4)

B00 <- c(.25, .3, .4)

C00 <-c(.6, .5,.2)

Tab<-c(.6,.5,.4)

Tac<-c(.2,.25,.35)

ds <- data.frame(cohort,A00,B00,C00,Tab,Tac)

print (ds)

cohort A00 B00 C00 Tab Tac

1 1980 0.15 0.25 0.6 0.6 0.20

2 1981 0.20 0.30 0.5 0.5 0.25

3 1982 0.40 0.40 0.2 0.4 0.35

Initial values in columns A00, B00, and C00 represent relevant size of each group (A,B,C) at time t=00. They add up to 1 across the row (A00+B00+C00=1). Parameters Tab and Tac are used to predict the prevalence at time t+1 using some mathematical model, for example

`A01 = df$A00 -df$Tab +df$Tac.`

The function to compute predicted values at time t+1 is

`forecast<- function( df ) {`

dsResult <- data.frame(

cohort= df$cohort,

A01 = df$A00 -df$Tab +df$Tac ,

B01 = df$B00 -df$Tab +df$Tac,

C01 = df$C00 -df$Tab +df$Tac

)

dsResult<- merge(df,dsResult,by="cohort")

return( dsResult)

}

new<-forecast(ds)

and produces the following result

`cohort A00 B00 C00 Tab Tac A01 B01 C01`

1 1980 0.15 0.25 0.6 0.6 0.20 -0.25 -0.15 0.20

2 1981 0.20 0.30 0.5 0.5 0.25 -0.05 0.05 0.25

3 1982 0.40 0.40 0.2 0.4 0.35 0.35 0.35 0.15

I would very much appreciate your help in learning how to write a loop to cycle through a desired number of years of the forecast( for t in 1:7, for instance). Thanks in advance!

Answer

Initially I'd like to make two suggestions that might make the problem easier to code. First, revise the data schema so that each year is a unique row, and each group is a unique column. Second, since the cohorts are treated mathematically independent of each other, keep them separate for now, at least until the code's kernel is built. Put a loop around this later that cycles through them. In the first block of code, there are two matrices, one with observed data, and one that will collect the predicted data.

```
yearCount <- 7 #Declare the number of time points.
groupCount <- 3 #Declare the number of groups.
#Create fake data that sum to 1 across rows/times.
ob <- matrix(runif(yearCount*groupCount), ncol=groupCount)
ob <- ob / apply(ob, 1, function( x ){ return( sum(x) )})
#Establish a container to old the predicted values.
pred <- matrix(NA_real_, ncol=groupCount, nrow=yearCount)
t12<-.5; t13<-.2; t11<-1-t12-t13 #Transition parameters from group 1
t21<-.2; t23<-.4; t22<-1-t21-t23 #Transition parameters from group 2
t31<-.3; t32<-.1; t33<-1-t31-t32 #Transition parameters from group 3
for( i in 2:yearCount ) {
pred[i, 1] <- ob[i-1, 1]*t11 + ob[i-1, 2]*t21 + ob[i-1, 3]*t31
pred[i, 2] <- ob[i-1, 1]*t12 + ob[i-1, 2]*t22 + ob[i-1, 3]*t32
pred[i, 3] <- ob[i-1, 1]*t13 + ob[i-1, 2]*t23 + ob[i-1, 3]*t33
}
#Calculate the squared errors
ss <- (pred[-1, ] - ob[-1, ])^2 #Ignore the first year of data
```

Inside the loop, you probably notice the familiar structure of matrix multiplication. Each row can be slightly condensed using inner products (ie, one row of the `ob`

matrix is multiplied, then summed with a one "column" of the `t`

s. I'm using `t12`

slightly differently than the `Tab`

in your post; this is the probability of transitioning from group 1 to group 2 at a given time point.

```
#Create transition parameters that sum to 1 across rows/groups.
tt <- matrix(runif(groupCount*groupCount), ncol=groupCount)
tt <- tt / apply(tt, 1, function( x ){ return( sum(x) )})
```

Pretend the `tt`

matrix was defined earlier, instead of the separate variables of `t11`

,...,`t33`

.

```
for( i in 2:yearCount ) {
pred[i, 1] <- ob[i-1, ] %*% tt[, 1]
pred[i, 2] <- ob[i-1, ] %*% tt[, 2]
pred[i, 3] <- ob[i-1, ] %*% tt[, 3]
}
```

The loop's contents are slightly cleaner than when each element pair was explicitly multiplied and summed. But we don't have to treat each row/column pair individually. All three columns of the `ob`

matrix can be operated on by all three columns of the `tt`

matrix simultaneously:

```
for( i in 2:yearCount ) {
pred[i, ] <- ob[i-1, ] %*% tt
}
```

This should be much quicker than even the previous version, because R's internal memory system isn't recreating the matrix three times for each row -only once per row. To reduce this to once per matrix, use the `apply`

function, and then transpose the matrix if that suits your purpose. Finally, notice that the rows represent different years than `pred`

(ie, row i-1 here is the same as row i in `pred`

).

```
predictionWIthExtraYear <- t(apply(ob, 1, FUN=function(row){row %*% tt}))
```

To accommodate cohorts, perhaps you could declare a list with three elements (for the 1980, 1981, and 1982 cohorts). Each element would be a unique `ob`

matrix. And create a second list for a unique `pred`

matrix. Or maybe use three dimensional matrices (but that may be more taxing when R recreates the memory with the replacement function).