Salmo salar Salmo salar - 1 month ago 11
R Question

lmer returning error: grouping factor must be < number of observations

I am attempting to run a mixed effect model (using R) on some data but struggling with one of the fixed effects, I think primarily due to it a factor?!

sample data:

data4<-structure(list(code = structure(1:10, .Label = c("10888", "10889",
"10890", "10891", "10892", "10893", "10894", "10896", "10897",
"10898", "10899", "10900", "10901", "10902", "10903", "10904",
"10905", "10906", "10907", "10908", "10909", "10910", "10914",
"10916", "10917", "10919", "10920", "10922", "10923", "10924",
"10925", "10927"), class = "factor"), speed = c(0.0296315046039244,
0.0366986630049636, 0.0294297725505692, 0.048316183511095, 0.0294275666501456,
0.199924957584131, 0.0798850288176711, 0.0445886457047146, 0.0285993712316451,
0.0715158276875623), meanflow = c(0.657410742496051, 0.608271363339857,
0.663241108786611, 0.538259450171821, 0.666299529534762, 0.507156583629893,
0.762448863636364, 37.6559178370787, 50.8557196935557, 31.6601587837838
), length = c(136, 157, 132, 140, 135, 134, 144, 149, 139, 165
), river = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L
), .Label = c("c", "f"), class = "factor")), .Names = c("code",
"speed", "meanflow", "length", "river"), row.names = c(2L, 4L,
6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L), class = "data.frame")


my model is as such:

model1<-lmer(speed ~ river + length +(1|meanflow)+(1|code), data4)


and when run returns error message:

Error in checkNlevels(reTrms$flist, n = n, control) :
number of levels of each grouping factor must be < number of observations


having trawled the internet I have found one response:

http://stats.stackexchange.com/questions/57912/random-effects-must-be-less-than-the-number-of-observations-error-in-lmer-pac

but for the life of me do not understand the responses to the question!

a point in the right direction will be more than welcome

Answer

You have two problems here:

  • It looks like you have one observation for every value of code. That means that you can't estimate both a residual variance (which is built in to lmer, and linear mixed models more generally) and an among-code variance -- both of these parameters will be trying to estimate the same variance component, and any combination of var(residual) and var(code) that adds up to the same value will represent an equally good fit to the data.

  • You also have one observation for every value of meanflow; this is because meanflow is a continuous variable, which is not usually something you want to use as a grouping variable in the model. I'm not sure what you're trying to capture with this term.

You can actually fit these models if you insist by using lmerControl to bypass the checks, but you won't necessarily get a sensible result!

model2 <- lmer(speed ~ river + length +(1|meanflow)+(1|code), data4,
    control=lmerControl(check.nobs.vs.nlev = "ignore",
     check.nobs.vs.rankZ = "ignore",
     check.nobs.vs.nRE="ignore"))

Here the variance has been divided approximately in equal thirds:

 VarCorr(model2)
 ##  Groups   Name        Std.Dev.
 ##  meanflow (Intercept) 0.035354
 ##  code     (Intercept) 0.032898
 ##  Residual             0.033590

If we use only one (still inappropriate) random effect,

model0 <- lmer(speed ~ river + length +(1|meanflow), data4,
    control=lmerControl(check.nobs.vs.nlev = "ignore",
     check.nobs.vs.rankZ = "ignore",
     check.nobs.vs.nRE="ignore"))

Now the variance is divided exactly in halves:

VarCorr(model0)
##  Groups   Name        Std.Dev.
##  meanflow (Intercept) 0.041596
##  Residual             0.041596