tanya singh - 2 months ago 6

R Question

I am trying to do mixed linear model for my study in R. I would like to know if my code is correct or not.

MY design - I have 5 sites, 2 subsites within each site and 2 permanent quadrates within each site.

So I have 5 sites, 10 subsites and 20 quadrats. I have measured colony size (of corals) at all the quadrats.

My question is does the size structure vary between sites ?

In my data quadrats are nested within subsite and subsites are nested within site.

I will use site as my fixed factor and subsites and quadrats as my random effects.

I can think of two possible ways of doing this:

`library(lme4)`

`lmer(size ~ site + (1|subsite) + (1|quadrat)`

`lmer(size ~ site + (1|site:subsite) + (1|subsite:quadrant)`

which one of these would be correct to use?

Thanks

Answer

It depends a bit on how your subsites and quadrats are coded. Let's consider two schemes.

**explicit nesting**: this means that the subsites within sites and quadrats within subsites don't have unique names, e.g.

```
site subsite quadrat
A a 1
A a 2
A b 1
A b 2
B a 1
B a 2
... etc.
```

In this case, you *must* use interaction/nesting syntax to let R know that quadrat 1 in site A, subsite a has nothing in common with all of the other quadrats labeled "1" ...

```
size ~ site + (1|site:subsite) + (1|site:subsite:quadrat)
```

(`size ~ site + (1|site:(subsite/quadrat))`

*might* work, but I haven't tested it)

**implicit nesting**: in this case, everything is uniquely named.

```
site subsite quadrat
A Aa Aa1
A Aa Aa2
A Ab Ab1
A Ab Ab2
B Ba Ba1
B Ba Ba2
... etc.
```

In this case, you can use *either* the syntax above (R automatically drops the redundant levels) or

```
size ~ site + (1|subsite) + (1|quadrat)
```

and you should get identical results. (You can always test this experimentally!)

A couple of other points:

- in general I recommend unique labels/implicit nesting (explicit nesting may be more convenient for humans gathering data on field notes, but you should convert to implicit nesting early in your data cleaning process), because it slightly reduces the chances of error
- I always recommend using the
`data`

argument with`lme4`

- if you don't care about quantifying within-site variation, and if your design is balanced, and your data are Normal (i.e. you're using
`lmer`

and not`glmer`

) you can**greatly**simplify your life by simply aggregating to the mean values per site and running a 1-way ANOVA (see Murtaugh 2007,*Ecology*, "Simplicity and complexity in ecological data analysis").