SASpencer - 4 months ago 36

R Question

I'm a bit confused on how to set priors for multiple predictors for the following model:

`require(rstanarm)`

wi_prior <- normal(0, sd(train$attendance))

SEED <- 101

fmla <- attendance ~ (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +

WSWin1 | franchID)

baylm <- stan_glmer(fmla,

data = train,

family = "gaussian",

algorithm = "sampling",

adapt_delta = .95,

prior_intercept = wi_prior, seed = SEED)

Here is the first observation in train, per request.

`train <- structure(list(franchID = structure(25L, .Label = c("ANA", "ARI",`

"ATL", "BAL", "BOS", "CHC", "CHW", "CIN", "CLE", "COL", "DET",

"FLA", "HOU", "KCR", "LAD", "MIL", "MIN", "NYM", "NYY", "OAK",

"PHI", "PIT", "SDP", "SEA", "SFG", "STL", "TBD", "TEX", "TOR",

"WSN"), class = "factor"), yearID = 1999L, name = "San Francisco Giants",

park = "3Com Park", attendance = 2078399L, W = 86L, W1 = 89L,

W2 = 90L, W3 = 68L, WCWin1 = FALSE, WCWin2 = FALSE, WCWin3 = FALSE,

DivWin1 = FALSE, DivWin2 = TRUE, DivWin3 = FALSE, LgWin1 = FALSE,

LgWin2 = FALSE, LgWin3 = FALSE, WSWin1 = FALSE, WSWin2 = FALSE,

WSWin3 = FALSE), .Names = c("franchID", "yearID", "name",

"park", "attendance", "W", "W1", "W2", "W3", "WCWin1", "WCWin2",

"WCWin3", "DivWin1", "DivWin2", "DivWin3", "LgWin1", "LgWin2",

"LgWin3", "WSWin1", "WSWin2", "WSWin3"), row.names = c(NA, -1L

), class = "data.frame")

Answer

You can specify a prior for coefficients on K predictors by passing a vector of length K to one of the supported distributions for priors. For example, if K = 4 you could do

```
wi_prior2 <- normal(location = c(0, 1, -2, 5))
```

You could also pass a vector of scales and / or a different family than `normal`

. Then, you would call `stan_glmer`

with `prior = wi_prior2`

. If you do

```
wi_prior2 <- normal(location = 0)
```

then the same prior would be used for all K common coefficients.

However, in your case I suspect that `fmla`

is mistaken. You typically also want to include most, if not all, of those predictors outside the lme4-style parenthetical expression to allow common effects across all levels of `franchID`

. Thus, `fmla`

would become

```
fmla <- attendance ~ W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 + (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
```

If you only include the part in parentheses, then you are assuming the coefficients on these variables are exactly zero in the population and only deviate from zero in subpopulations defined by the levels of `franchID`

. So, there would not be an opportunity to put prior distributions on their coefficients.

The prior on the group-wise deviations from the common coefficients is conditionally multivariate normal with mean vector zero and a somewhat complicated but unknown covariance structure. This is explained in more detail in `help(priors, package = "rstanarm")`

.