SASpencer SASpencer - 2 months ago 25
R Question

Set priors for multiple predictors in rstanarm?

I'm a bit confused on how to set priors for multiple predictors for the following model:


wi_prior <- normal(0, sd(train$attendance))
SEED <- 101

fmla <- attendance ~ (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)

baylm <- stan_glmer(fmla,
data = train,
family = "gaussian",
algorithm = "sampling",
adapt_delta = .95,
prior_intercept = wi_prior, seed = SEED)

Here is the first observation in train, per request.

train <- structure(list(franchID = structure(25L, .Label = c("ANA", "ARI",
"ATL", "BAL", "BOS", "CHC", "CHW", "CIN", "CLE", "COL", "DET",
"FLA", "HOU", "KCR", "LAD", "MIL", "MIN", "NYM", "NYY", "OAK",
"PHI", "PIT", "SDP", "SEA", "SFG", "STL", "TBD", "TEX", "TOR",
"WSN"), class = "factor"), yearID = 1999L, name = "San Francisco Giants",
park = "3Com Park", attendance = 2078399L, W = 86L, W1 = 89L,
W2 = 90L, W3 = 68L, WCWin1 = FALSE, WCWin2 = FALSE, WCWin3 = FALSE,
DivWin1 = FALSE, DivWin2 = TRUE, DivWin3 = FALSE, LgWin1 = FALSE,
LgWin2 = FALSE, LgWin3 = FALSE, WSWin1 = FALSE, WSWin2 = FALSE,
WSWin3 = FALSE), .Names = c("franchID", "yearID", "name",
"park", "attendance", "W", "W1", "W2", "W3", "WCWin1", "WCWin2",
"WCWin3", "DivWin1", "DivWin2", "DivWin3", "LgWin1", "LgWin2",
"LgWin3", "WSWin1", "WSWin2", "WSWin3"), row.names = c(NA, -1L
), class = "data.frame")


You can specify a prior for coefficients on K predictors by passing a vector of length K to one of the supported distributions for priors. For example, if K = 4 you could do

wi_prior2 <- normal(location = c(0, 1, -2, 5))

You could also pass a vector of scales and / or a different family than normal. Then, you would call stan_glmer with prior = wi_prior2. If you do

wi_prior2 <- normal(location = 0)

then the same prior would be used for all K common coefficients.

However, in your case I suspect that fmla is mistaken. You typically also want to include most, if not all, of those predictors outside the lme4-style parenthetical expression to allow common effects across all levels of franchID. Thus, fmla would become

fmla <- attendance ~ W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 + 
        WSWin1 + (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 + 
                  WSWin1 | franchID)

If you only include the part in parentheses, then you are assuming the coefficients on these variables are exactly zero in the population and only deviate from zero in subpopulations defined by the levels of franchID. So, there would not be an opportunity to put prior distributions on their coefficients.

The prior on the group-wise deviations from the common coefficients is conditionally multivariate normal with mean vector zero and a somewhat complicated but unknown covariance structure. This is explained in more detail in help(priors, package = "rstanarm").