Lisa - 2 months ago 16

R Question

I am trying to run a mixed-effects model that predicts

`F2_difference`

fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients.

From this link, Fixed-effects model is rank deficient, I think I should use

`findLinearCombos`

`caret`

`findLinearCombos(data.df)`

Error in qr.default(object) : NA/NaN/Inf in foreign function call (arg 1)

In addition: Warning message:

In qr.default(object) : NAs introduced by coercion

My data does not have any NAs - What could be causing this? (Sorry if the answer is various obvious - I am new to R).

All of my data are factors except the numerical value that I am trying to predict. Here is a small sample of my data.

`sex <- c("f", "m", "f", "m")`

nasal <- c("TRUE", "TRUE", "FALSE", "FALSE")

vowelLabel <- c("a", "e", "i", "o")

speaker <- c("Jim", "John", "Ben", "Sally")

word_1 <- c("going", "back", "bag", "back")

type <- c("coronal", "coronal", "labial", "velar")

F2_difference <- c(345.6, -765.8, 800, 900.5)

data.df <- data.frame(sex, nasal, vowelLabel, speaker,

word_1, type, F2_difference

stringsAsFactors = TRUE)

Edit:

Here is some more code, if it helps.

`formula <- F2_difference ~ sex + nasal + type + vowelLabel +`

type * vowelLabel + nasal * type +

(1|speaker) + (1|word_1)

lmer(formula, REML = FALSE, data = data.df)

The OP did not provide sufficient number of test data to allow an actual run of the model in

`lmer`

Answer

I think, you are slightly over-concerned with the warning message:

fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients.

It is just a warning message, not an error message. Since there is neither misuse of `lmer`

function, nor ill-specification of your model formula, you should be able to obtain an estimated model.

But anyway, since you asked, I shall try my best to explain the story behind this.

During execution of `lmer`

, your original model formula is broken into to parts: **fixed effect formula** and **random effect formula**. For each part, a **model matrix** is constructed. The construction for the fixed part is passed to the standard model matrix constructor `model.matrix (formula, data.frame)`

, while the construction for the random part is slightly more complicated, but is nothing related to your question, so we skip it.

For your model, you can check what the fixed effect model matrix looks like by:

```
fix.formula <- F2_difference ~ sex + nasal + type + vowelLabel +
type * vowelLabel + nasal * type
X <- model.matrix (fix.formula, data.df)
```

Since all your variables like `sex`

, etc are factors, `X`

will be a 0-1 binary matrix. Though `model.matrix`

will apply `contrast`

for each factor and the interaction between factor, like `type * vowelLable`

, it is still possible that `X`

does not have full column rank. This is because those variables may be linearly correlated. Therefore, any well-written model fitting routine, like `lm`

, `glm`

, etc., will apply QR decomposition for `X`

to use only the non-deficient subspace of `X`

for model fitting, shrinking other coefficients to 0 or `NA`

. The warning message you get is a signal that such shrinkage happens. There are originally `ncol(X)`

number of coefficients to estimate, however due to rank-deficiency, only `ncol(X) - 7`

will be estimated, with the remaining 7 being 0 or `NA`

.

Such shrinkage is actually a **good sign**, rather than a bad one, because it ensures **numerical stability** in solving linear systems of equations during model fitting. So you should not bother with it. **Correlation between variables are too often in real-world statistical modelling**; modeller can not have control on this. Therefore, rank detection and stability should always be passed to fitting routine to deal with. **Unless you are writing a model fitting routine, like lm, etc, in all cases, you should not deal with rank problem manually.** That is just an ugly piece of work for non-expert.

To help you know better, let's exploit the `lm`

function to check what will happen, if we fit a linear model with `fix.formula`

:

```
fix.fit <- lm (fix.formula, data.df, method = "qr", singular.ok = TRUE)
```

Note that, the `method = "qr"`

and `singular.ok = TRUE`

are **default**, so actually we don't need to set it here. But writing it out helps to see that `lm`

uses QR decomposition for `X`

to allow rank-deficient model to fit. On the other hand, if set `singular.ok = FALSE`

:

```
oo <- lm (fix.formula, data.df, method = "qr", singular.ok = FALSE)
```

`lm`

will stop and complain about rank-deficiency.

Now let's check about the returned values in `fix.fit`

.

```
coef <- fix.fit$coef
p <- length(coef)
```

stores all coefficients for model matrix `X`

. It is guaranteed that the number of coefficients `p = ncol(X)`

. However, if you check `coef`

by

```
no.NA <- sum(is.na(coef))
```

you should see 7 `NA`

. Surprising? Now let's check rank

```
rank <- fix.fit$rank
```

This is the numerical rank for `X`

. You should verify that `rank + no.NA = p`

.

Though I am talking about `lm`

, but the same thing happens inside `lmer`

as well. **The only difference, is that lm will not report rank-deficiency, while lmer will report this as a warning**. This is actually a good thing. Too often, I see people asking why

`lm`

returning `NA`

for some coefficients. Since `lm`

does not report, this can be confusing for non-expert.Hope my explanation is clear and helpful.

**Update:**

Don't worry about the use of `anova`

and `BIC`

either, the package function will take care of them properly.

I have also edited your original post, by adding tags `lme4`

, `linear regression`

, `regression`

, and renaming the title to be more informative. Hopefully, people with the same doubt can easily find your post via Google, etc.