Rewarp - 4 months ago 43

R Question

I am trying to carry out an lmerTest on two separate datasets, and for some reason I am getting the following error for one of the datasets.

In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced

This dataset gives me the p-value of the interaction term between

`habitat`

`soil`

anova(lmer(sqrt(abs) ~ habitat*soil + (1|species), data=frl_light,

REML=T))

`Analysis of Variance Table of type III with Satterthwaite`

approximation for degrees of freedom

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)

habitat 0.057617 0.028809 2 8.8434 1.0880 0.37805

soil 0.232708 0.232708 1 2.6732 8.7888 0.06848 .

habitat:soil 0.308003 0.154001 2 2.7134 5.8163 0.10443

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This dataset which has a similar structure however throws the error, and fails to give the p-value for the interaction between

`habitat`

`light`

anova(lmer(sqrt(abs) ~ habitat*light + (1|species), data=frl_soil,

REML=T))

`Analysis of Variance Table of type III with Satterthwaite`

approximation for degrees of freedom

Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)

habitat 0.00845 0.004223 2 7.9751 0.3494 0.7154

light 0.01634 0.016336 1 1.9241 1.3517 0.3689

habitat:light 0.42813 0.214067 2 0.0000 17.7124

Warning message:

In pf(F.stat, qr(Lc)$rank, nu.F) : NaNs produced

I have no idea why lmerTest works for one dataset but not the other as both datasets appear to me at least, to be virtually indistinguishable from one another. If there is anyone who can shed light on the matter, please help.

Answer

I can't tell you exactly why the Satterthwaite approximation gives you a zero-df estimate here (which is indeed why you get the error and no $p$-value for the interaction); you'd have to work through the code in detail to see (type `lmerTest:::calcSatterthMultDF`

and start digging ...) I did a *little* bit of digging; the key lines therein are

```
E <- sum((nu.m/(nu.m - 2)) * as.numeric(nu.m > 2))
nu.F <- 2 * E * as.numeric(E > q)/(E - q)
```

where (I think) `nu.m`

($\nu_m$) is the number of degrees of freedom as estimated by the Welch-Satterthwaite approximation. I don't know why (1) `lmerTest`

sets `E`

and hence `nu.F`

to zero when `nu.m <= 2`

; (2) the particular combination of within-group variances in one data set gives `nu.m < 2`

and in the other data set doesn't ...

In the meantime, though, you can use the Kenward-Roger approximation if you like (it's computationally more expensive, but in general slightly more accurate ...) Using the data set from the second link:

```
frl_soil <- read.csv("frl_soil.csv")
head(frl_soil,2)
## X species habitat light abs
## 1 1 ANI2GR gen G.cs 2.67477395
## 2 2 DIPTAC gen G.cs 0.09549154
anova(lmer(sqrt(abs) ~ habitat*light + (1|species),
data=frl_soil, REML=TRUE),
ddf="Kenward-Roger")
## Analysis of Variance Table of type III with Kenward-Roger
## approximation for degrees of freedom
## Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
## habitat 0.00842 0.004208 2 8.1220 0.3482 0.71602
## light 0.01568 0.015679 1 2.0712 1.2973 0.36928
## habitat:light 0.40886 0.204432 2 2.0713 16.9152 0.05212 .
```