Lawrence303 Lawrence303 - 8 months ago 118
R Question

Error when predicting with DirichletReg package in R

I am trying to make predictions on a test set using the

DirichReg
function from the
DirichletReg
package. When I run the model with only a few predictors it works fine, but when I use more than ~5 predictors I get an error that I can't figure out. The code below creates an MWE that reproduces the error.

library(DirichletReg)
set.seed(1)
# create dataset
predictor1 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor2 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor3 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor4 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor5 <- rnorm(n = 1000, mean = 5, sd = 1)
predictor6 <- rnorm(n = 1000, mean = 5, sd = 1)
prob_A <- runif(n = 1000, min = 0, max = 0.5)
prob_B <- runif(n = 1000, min = 0, max = 0.5)
prob_C <- 1 - prob_A - prob_B
dat <- data.frame(predictor1, predictor2, predictor3, predictor4, predictor5,
predictor6, prob_A, prob_B, prob_C)

# split data into training and test sets
train_vec <- sample(c(0, 1), size = nrow(dat), replace = T, prob = c(0.2, 0.8))
train_dat <- dat[train_vec == 1, ]
test_dat <- dat[train_vec == 0, ]

# run model
train_dat$prob <- DR_data(train_dat[, c('prob_A', 'prob_B', 'prob_C')])
mod <- DirichReg(prob ~ predictor1 + predictor2 + predictor3 + predictor4 +
predictor5 + predictor6,
data = train_dat, model = 'common')

# run predictions
test_dat$prob <- DR_data(test_dat[, c('prob_A', 'prob_B', 'prob_C')])
preds <- predict(object = mod, newdata = test_dat)


Here's the error that I'm getting:

Error in parse(text = x, keep.source = FALSE) :
<text>:1:74: unexpected '|'
1: prob ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + |
^


I would appreciate any help. I haven't been able to google the error or find it in the package documentation.

Answer Source

This seems to be a bug in the package. I recommend that you contact the package maintainer to report it.

A possible workaround is to explicitly list the separate parts of the regression specification instead of relying on the package to internally replicate the regressors for all parts.

mod2 <- DirichReg(prob ~
  predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + predictor6 |
  predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + predictor6 |
  predictor1 + predictor2 + predictor3 + predictor4 + predictor5 + predictor6,
  data = train_dat, model = "common")
all.equal(coef(mod), coef(mod2))
## [1] TRUE
predict(mod2, newdata = test_dat)
##             [,1]      [,2]      [,3]
##   [1,] 0.2436493 0.2715895 0.4847612
##   [2,] 0.2541715 0.2252292 0.5205993
##   [3,] 0.2618741 0.2345063 0.5036196
##   ...
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download