Felix25 - 9 months ago 363

R Question

I have a dataset with 142 data entries: 121 individuals measured on two occasions (two years, before and after treatment, Year = 0 or 1), in the second year 46 individuals were in treated plots and the rest were in control plots (treatment = 0 or 1). Here's some example data:

`ID <- c("480", "480", "620", "620","712","712")`

Year <- c("0", "1", "0", "1","0", "1")

Plot <- c("14", "14", "13", "13","20","20")

Treat <- c("0", "0", "0", "1", "0", "1")

Exp <- c("31", "43", "44", "36", "29", "71")

ExpSqrt <- c("5.567764", "6.557439", "6.633250", "6.000000", "5.385165", "8.426150")

Winter <- data.frame(ID, Year, Plot, Treat,

Exp, ExpSqrt,

stringsAsFactors = TRUE)

Plots and individuals are random factors and I'm trying to fit a mixed model to determine the effect of Year, Treatment and the interaction between them:

`model_Exp <- lmer(ExpSqrt~Year+Treat+Year*Treat+(1|ID)+(1|Plot),data=Winter)`

but I keep getting the warning message:

`"fixed-effect model matrix is rank deficient so dropping 1 column / coefficient"`

This removes the interaction.

I have no NA values in my dataset and Exp is always positive but I have sqrt transformed this as the distribution was non-normal. It's not a particularly small dataset, I have tried the using the function findLinearCombos from the caret package but it returns no result.

My understanding is that there is some problem because treatment 1 only occurs under condition year=1 (but not in all instances: Year=1 also contains 75 control individuals).

I am not sure a) how or if this can be resolved?

or b) if it can't be resolved how to interpret this?

I have read some responses about this warning but have done everything I found suggested to resolve it, I've also read a bit about the Hauck-Donner effect, but I'm not sure if this is my problem and being relatively new to stats I can't admit I entirely understand it.

Answer

This is not really specifically a linear-mixed model problem.

It comes down to the fact that you can't estimate an interaction if you don't have any treatment happening in the 'before' period (year 0).

Simplest possible example:

```
(dd <- data.frame(y=1:3,treat=c(0,0,1),year=c(0,1,1)))
## y treat year
## 1 1 0 0
## 2 2 0 1
## 3 3 1 1
```

Fit the model:

```
lm(y~treat*year,dd) ## == year+treat+year:treat
## Call:
## lm(formula = y ~ treat * year, data = dd)
##
## Coefficients:
## (Intercept) treat year treat:year
## 1 1 1 NA
```

`lm`

doesn't warn you, but it effectively does the same thing as `lmer`

by removing the extra, collinear column and giving its parameter an `NA`

value. If you try `caret::findLinearCombos(dd[c("year","treat")])`

you won't get anything back (`year`

and `treat`

are not perfectly collinear), but if you look at the model matrix that R constructs to include the treatment column, you will get something:

```
X <- model.matrix(~year*treat,dd)
caret::findLinearCombos(X)
## $linearCombos
## $linearCombos[[1]]
## [1] 4 3
## $remove
## [1] 4
```

This experimental design simply doesn't allow you to estimate the interaction. If you remove it from the formula (use `year+treat`

instead of `year*treat`

) you'll get the same answer, but without the message. Alternatively, in a typical "before-after-control-impact" design (in environmental impact assessment), you would label the individuals who *would be getting the treatment* as "impact" or "treated" individuals even in year 0; then the interaction would be your actual estimated effect of treatment.