Raphael Raphael - 2 months ago 11x
R Question

How to estimate multilevel Poisson models in lme4 with large count values?

The case:
I am trying to estimate gravity models of migration (

) from 55 origin districts (
) to 54 destination districts (
). Two important predictors are the distance (
) between each origin and destination district and an indicator variable of the contiguity (
) of any origin-destination pair. The migration outcome variable is a measure of the count of people migrating between pairs of districts (ranging from zero to a couple of thousands). Below is a test data set that approximates my data situation. In the real data, the migration outcome variable is slightly less ideally poisson distributed (
min=0, max=9450, mean=85, median=10


#*** Generate test data set
td=data.frame(IDo=rep(1:55,each=55),IDd=rep(1:55,times=55),dist=runif(3025,0.186,12.7),contig=rbinom(3025,1,p=0.08), stringsAsFactors=F)
td=td[td$IDo!=td$IDd,] # remove cases for which origin and destination are the same
td=rbind(td,td) # dublicate data for two years
td$year=rep(c(1,0),each=2970) # generate year dummy variable
td$mig=rpois(5940, lambda = 1)*1000 # generate migrant count variable

# Cross-classified random effects Poisson models
fm0=glmer(m1,data=td,family="poisson",control=glmerControl(optimizer="bobyqa"), nAGQ=0) # Adaptive Gauss-Hermite Quadrature
fm1=glmer(m1,data=td,family="poisson",control=glmerControl(optimizer="bobyqa"), nAGQ=1) # Laplace Approximation

# Regular Poisson models

The problem:
I am running cross-classified multilevel Poisson models using
. When using the default nAGQ=1 setting (
), I always get the following warning message, and in the real data case, the model fails to converge (
In checkConv(attr(opt, "derivs"), opt$par, control$checkConv Model failed to converge with max|grad| = 0.00248588 (tol = 0.001, component 1)

Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?

However, this problem does not emerge with setting of nAGQ=0 (
) or when running it as a regular fixed effects model (

Does anyone have a suggestion what to do to avoid the problem that underlies this warning message (and leads to convergence issues in the real data)? The predictor variables have all a reasonably small scale, so no need to rescale these variables. I could potentially rescale my outcome variable to have smaller values (
) but this would change the distribution and should not be done as indicated in this post.


I really think this is a false positive; it's unusual to have count data with such large counts ... have you tried fitting with different optimizers (see ?lme4::convergence) ? (I just tried your example with control=glmerControl(optim="nloptwrap") and got near-identical results.)

I'm a little worried about your distributional assumptions. I know it's just a reproducible example, but drawing a Poisson sample and multiplying it by 1000 does not give you a Poisson-distributed value ... here's a little posterior predictive simulation that simulates the distribution of the 90th quantile of the response under the fitted model -- which is way different from the observed value ...

ss <- simulate(fm1,1000,seed=101)
qq <- sapply(ss,quantile,0.9)
##   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1106    1153    1167    1168    1182    1255 
quantile(td$mig,0.9) ## 2000