Mud Warrior - 1 year ago 144
R Question

# How can I compute the median absolute deviation (MAD) for generalized linear mixed-effects models

I know my question is linked to stats but I'm looking for a solution in

`R`
, so I believe it's suited for SO.

I built a generalized linear mixed-effects model (GLMM) using the
`glmer`
function from the
`lme4`
package in
`R`
to model species richness around aquaculture sites based on significant explanatory variables using Zuur et al. (2009) Mixed Effects Models and Extensions in Ecology with R. The model is:

``````Mod1 <- glmer(Richness ~ Distance + Depth + Substrate + Beggiatoa +
Distance*Beggiatoa + (1|Site/transect), family = poisson, data = mydata)
``````

Now I have a full data set collected at different sites and I want to assess how this model performs on the new data set.

Following a question on CV, someone suggested to look for the median absolute deviation (mad) on the new data set. I tried the
`mad`
function from the
`stats`
package in
`R`
but I get the following error message:

``````Error in x[!is.na(x)] : object of type 'S4' is not subsettable
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
``````

Does anybody knows what's going wrong here? Is it that
`mad`
in
`stats`
can't be calculated for GLMMs? If so, is there another
`R`
package to calculate mad from GLMMs?

Edit:

To give you an idea of my data, here's the output from
`dput(head(mydata))`
, also note that there's no "Substrate" category in the new data set and "S" refers to "Richness":

``````structure(list(S = c(0, 1, 2, 3, 3, 2), Site = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("BC", "BH", "GC", "IS", "Ref"
), class = "factor"), Transect = structure(c(4L, 4L, 4L, 4L,
4L, 4L), .Label = c("10GC", "10IS", "10N", "10S", "11IS", "12IS",
"13E", "1GC", "1N", "1W", "2E", "2GC", "2IS", "2N", "2W", "2WA",
"3E", "3GC", "3IS", "3N", "3S", "4E", "4GC", "4IS", "4S", "4W",
"5GC", "5IS", "5S", "6GC", "6IS", "6N", "6S", "6W", "7E", "7GC",
"7IS", "8GC", "8IS", "8W", "9E", "9GC", "9IS", "9N", "RefBC1",
"RefBC10", "RefBC11", "RefBC12", "RefBC2", "RefBC3", "RefBC4",
"RefBC5", "RefBC6", "RefBC7", "RefBC8", "RefBC9", "X1", "X2"), class = "factor"),
Distance = c(2, 20, 40, 80, 120, 160), Depth = c(40L, 40L,
50L, 40L, 40L, 40L), Beggiatoa = c(2, 1, 1, 0, 0, 0)), .Names = c("S",
"Site", "Transect", "Distance", "Depth", "Beggiatoa"), row.names = c(NA,
6L), class = "data.frame")
``````

For within-sample error, the mean absolute deviation computation would just be

``````mad(residuals(fitted_model))
``````

... you might want `residuals(fitted_model,type="response")`, since `residuals` will give you deviance residuals by default (see `?residuals.merMod`)

If you want to look at out-of-sample error, you could do something like this:

``````pred <- predict(fitted_model,newdata=newdf,type="response",
re.form=~0)
(`re.form=~0` specifies that you want to omit random effects from the prediction, which is your only choice unless you're predicting at sites/transects where you've also got training data)