timothy.s.lau timothy.s.lau - 1 month ago 8
R Question

Finding model predictor values that maximize the outcome

How do you find the set of values for model predictors (a mixture of linear and non-linear) that yield the highest value for the response.

Example Model:

library(lme4); library(splines)

summary(lmer(formula = Solar.R ~ 1 + bs(Ozone) + Wind + Temp + (1 | Month), data = airquality, REML = F))


Here I am interested in what conditions (predictors) produce the highest solar radation (outcome).

This question seems simple, but I've failed to find a good answer using Google.

If the model was simple, I could take the derivatives to find the maximum or minimum. Someone has suggested that if the model function can be extracted, the
stats::optim()
function might be used. As a last resort, I could simulate all the reasonable variations of input values and plug it into the
predict()
function and look for the maximum value.

The last approach mentioned doesn't seem very efficient and I imagine that this is a common enough task (e.g., finding optimal customers for advertising) that someone has built some tools for handling it. Any help is appreciated.

Answer

There are some conceptual issues here.

  • for the simple terms (Wind and Temp), the response is a linear (and hence both monotonic and unbounded) function of the predictors. Thus if these terms have positive parameter estimates, increasing their values to infinity (Inf) will give you an infinite response (Solar.R); values should be as small as possible (negative infinite) if the coefficients are negative. Practically speaking, then, you want to set these predictors to the minimum or maximum reasonable value if the parameter estimates are respectively negative or positive.

  • for the bs term, I'm not sure what the properties of the B-spline are beyond the boundary knots, but I'm pretty sure that the curves go off to positive or negative infinity, so you've got the same issue. However, for the case of bs, it's also possible that there are one or more interior maxima. For this case I would probably try to extract the basis terms and evaluate the spline over the range of the data ...

Alternatively, your mentioning optim makes me think that this is a possibility:

data(airquality)
library(lme4)
library(splines)
m1 <- lmer(formula = Solar.R ~ 1 + bs(Ozone) + Wind + Temp + (1 | Month),
           data = airquality, REML = FALSE)
predval <- function(x) {
    newdata <- data.frame(Ozone=x[1],Wind=x[2],Temp=x[3])
    ## return population-averaged prediction (no Month effect)
    return(predict(m1, newdata=newdata, re.form=~0))
}
aq <- na.omit(airquality)
sval <- with(aq,c(mean(Ozone),mean(Wind),mean(Temp)))
predval(sval)
opt1 <- optim(fn=predval,
      par=sval,
      lower=with(aq,c(min(Ozone),min(Wind),min(Temp))),
      upper=with(aq,c(max(Ozone),max(Wind),max(Temp))),
      method="L-BFGS-B",  ## for constrained opt.
      control=list(fnscale=-1))  ## for maximization
## opt1
## $par
## [1] 70.33851 20.70000 97.00000
## 
## $value
## [1] 282.9784

As expected, this is intermediate in the range of Ozone(1-168), and min/max for Wind (2.3-20.7) and Temp (57-97).

Comments