Rewarp Rewarp - 3 months ago 35
R Question

Interaction term failing lmerTest with In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced

I am trying to carry out an lmerTest on two separate datasets, and for some reason I am getting the following error for one of the datasets.


In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced


This dataset gives me the p-value of the interaction term between
habitat
and
soil
without issue.


anova(lmer(sqrt(abs) ~ habitat*soil + (1|species), data=frl_light,
REML=T))


Analysis of Variance Table of type III with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
habitat 0.057617 0.028809 2 8.8434 1.0880 0.37805
soil 0.232708 0.232708 1 2.6732 8.7888 0.06848 .
habitat:soil 0.308003 0.154001 2 2.7134 5.8163 0.10443
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


This dataset which has a similar structure however throws the error, and fails to give the p-value for the interaction between
habitat
and
light
. The density degree of freedom measurement is also 0, which is probably the problem.


anova(lmer(sqrt(abs) ~ habitat*light + (1|species), data=frl_soil,
REML=T))


Analysis of Variance Table of type III with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
habitat 0.00845 0.004223 2 7.9751 0.3494 0.7154
light 0.01634 0.016336 1 1.9241 1.3517 0.3689
habitat:light 0.42813 0.214067 2 0.0000 17.7124
Warning message:
In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced


I have no idea why lmerTest works for one dataset but not the other as both datasets appear to me at least, to be virtually indistinguishable from one another. If there is anyone who can shed light on the matter, please help.

Answer

I can't tell you exactly why the Satterthwaite approximation gives you a zero-df estimate here (which is indeed why you get the error and no $p$-value for the interaction); you'd have to work through the code in detail to see (type lmerTest:::calcSatterthMultDF and start digging ...) I did a little bit of digging; the key lines therein are

E <- sum((nu.m/(nu.m - 2)) * as.numeric(nu.m > 2))
nu.F <- 2 * E * as.numeric(E > q)/(E - q)

where (I think) nu.m ($\nu_m$) is the number of degrees of freedom as estimated by the Welch-Satterthwaite approximation. I don't know why (1) lmerTest sets E and hence nu.F to zero when nu.m <= 2; (2) the particular combination of within-group variances in one data set gives nu.m < 2 and in the other data set doesn't ...

In the meantime, though, you can use the Kenward-Roger approximation if you like (it's computationally more expensive, but in general slightly more accurate ...) Using the data set from the second link:

frl_soil <- read.csv("frl_soil.csv")
head(frl_soil,2)
##   X species habitat light        abs
## 1 1  ANI2GR     gen  G.cs 2.67477395
## 2 2  DIPTAC     gen  G.cs 0.09549154
anova(lmer(sqrt(abs) ~ habitat*light + (1|species), 
            data=frl_soil, REML=TRUE), 
     ddf="Kenward-Roger")
## Analysis of Variance Table of type III  with  Kenward-Roger 
## approximation for degrees of freedom
##                Sum Sq  Mean Sq NumDF  DenDF F.value  Pr(>F)  
## habitat       0.00842 0.004208     2 8.1220  0.3482 0.71602  
## light         0.01568 0.015679     1 2.0712  1.2973 0.36928  
## habitat:light 0.40886 0.204432     2 2.0713 16.9152 0.05212 .