Alice Hobbs - 3 months ago 40

R Question

I have three columns in the dataframe **dune (below - bottom of the page)** describing the % cover of marram grass recorded for three different sand dune ecosystems:

(1) Restored;

(2) Degraded; and

(3) Natural;

I have performed two different One Way Anova tests (below) - test 1 and test 2 - to establish significant differences between ecosystems. Test 1 clearly shows significant differences between ecosystems; however, test 2 shows no significant differences. The box plot's (below) show stark differences in variance between ecosystems.

Afterwards, I melted the dataframe to produce a factorial column (i.e, headed Ecosystem.Type) which is also the response variable. The idea is to apply a glm model (test 3 - below)to test with a One Way Anova; however, this method was unsuccessful (please find the error message below).

**Problem**

I am confused whether my code to perform each One Way Anova test is correct and the correct procedure to perform post hoc tests (Turkey HSD, Scheffe or others) to distinguish pairs of ecosystems that are significantly different. If anyone can help, I would be deeply appreciative for your advice. Many thanks....

`data(dune)`

`dune.type.1<-aov(Natural~Restored+Degraded, data=dune)`

summary.aov(dune.type.1, intercept=T)

Df Sum Sq Mean Sq F value Pr(>F)

(Intercept) 1 34694 34694 138.679 1.34e-09 ***

Restored 1 94 94 0.375 0.548

Degraded 1 486 486 1.942 0.181

Residuals 17 4253 250

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

`posthoc<-TukeyHSD(dune.type.1, conf.level=0.95)`

Error in TukeyHSD.aov(dune.type.1, conf.level = 0.95) :

no factors in the fitted model

In addition: Warning messages:

1: In replications(paste("~", xx), data = mf) :

non-factors ignored: Restored

2: In replications(paste("~", xx), data = mf) :

non-factors ignored: Degraded

`dune1<-aov(Restored~Natural, data=dune)`

dune2<-aov(Restored~Degraded, data=dune)

dune3<-aov(Degraded~Natural, data=dune)

summary(dune1)

Df Sum Sq Mean Sq F value Pr(>F)

Natural 1 86 85.58 0.356 0.558

Residuals 18 4325 240.26

summary(dune2)

Df Sum Sq Mean Sq F value Pr(>F)

Degraded 1 160 159.7 0.676 0.422

Residuals 18 4250 236.1

summary(dune3)

Df Sum Sq Mean Sq F value Pr(>F)

Natural 1 168.5 168.49 2.318 0.145

Residuals 18 1308.5 72.69

`melt.dune<-melt(dune, measure.vars=c("Degraded", "Restored", "Natural"))`

colnames(melt.dune)=c("Ecosystem.Type", "Percentage.cover")

melt.dune$Percentage.cover<-as.numeric(melt.dune$Percentage.cover)

glm.dune<-glm(Ecosystem.Type~Percentage.cover, data=melt.dune)

summary(glm.dune)

Error

glm.dune<-glm(Ecosystem.Type~Percentage.cover, data=melt.dune)

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, :

NA/NaN/Inf in 'y'

In addition: Warning messages:

1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors

3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

`structure(list(Ecosystem.Type = structure(c(1L, 1L, 1L, 1L, 1L,`

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,

3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Degraded", "Restored",

"Natural"), class = "factor"), Percentage.cover = c(12, 17, 21,

11, 22, 16, 7, 9, 14, 2, 3, 15, 23, 4, 19, 36, 26, 4, 15, 23,

38, 46, 65, 35, 54, 29, 48, 13, 19, 33, 37, 55, 11, 53, 13, 24,

28, 44, 42, 39, 18, 61, 31, 46, 51, 51, 41, 44, 55, 47, 73, 43,

25, 42, 21, 13, 65, 30, 47, 29)), row.names = c(NA, -60L), .Names = c("Ecosystem.Type",

"Percentage.cover"), class = "data.frame")

`structure(list(Degraded = c(12L, 17L, 21L, 11L, 22L, 16L, 7L,`

9L, 14L, 2L, 3L, 15L, 23L, 4L, 19L, 36L, 26L, 4L, 15L, 23L),

Restored = c(38L, 46L, 65L, 35L, 54L, 29L, 48L, 13L, 19L,

33L, 37L, 55L, 11L, 53L, 13L, 24L, 28L, 44L, 42L, 39L), Natural = c(18L,

61L, 31L, 46L, 51L, 51L, 41L, 44L, 55L, 47L, 73L, 43L, 25L,

42L, 21L, 13L, 65L, 30L, 47L, 29L)), .Names = c("Degraded",

"Restored", "Natural"), class = "data.frame", row.names = c(NA,

-20L))

Answer

there are several things I would like to point to you.

First, the test 1 and test 2 produce similar results. The only difference is that you selected an intercept on test 1 and thus the outcome tells you that if you fit a linear model (I will come to that in a few minutes) intercept is required. Hence the significance you see is about whether the line you force to fit needs an intercept or not. Try using "intercept=T" to the other outcomes and I am pretty sure you will get similar results.

The second thing you should be careful is about the linear model you try to fit. The dune.type.1 model is a model where you actually see how correlated the different sand dune ecosystems are. In other words, you assume that there is a linear association between natural and restored and with every unit increase of the restored you have some increase on the natural. If I understood correctly what you want is to examine the mean differences and not their correlation. Thus you can do two things:

The data is prepared to perform t.tests (a test that compares the mean between several categories). It is very easy to do in R and valid since all the variables are reasonably normal. However you will have multiple testing issues (you will perform probably 3 t-tests to get all the mean differences), and thus need to use a Bonferroni correction.

But I think what you really want is the following:

First reform the data

```
data <- data.frame(v = c(dune$Degraded, dune$Restored, dune$Natural),
labels = c(rep("Degraded", times = 20), rep("Restored", times = 20),
rep("Natural", times = 20)))
```

Then fit a linear model

```
mod.1 <- lm(v ~ labels, data = data)
summary(mod.1)
lm(formula = v ~ labels, data = data)
Residuals:
Min 1Q Median 3Q Max
-28.650 -10.725 0.875 8.050 31.350
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.950 3.066 4.875 9.07e-06 ***
labelsNatural 26.700 4.337 6.157 7.95e-08 ***
labelsRestored 21.350 4.337 4.923 7.64e-06 ***
```

where you can actually see that the mean of the baseline category (i.e the degraded) is significantly smaller with the mean of the natural category and etc. You can also check the model assumptions, to see if your model is a good fit

```
qqnorm(residuals(mod.1))
qqline(residuals(mod.1))
```

They residuals are reasonably normal so the model is fine. You can also follow your anova approach and have:

```
anova.model <- aov(v ~ labels, data = data))
summary(anova.model)
Df Sum Sq Mean Sq F value Pr(>F)
labels 2 7982 3991 21.22 1.29e-07 ***
Residuals 57 10720 188
```

which indicates that there is at least one significant difference between the means of the sand dune ecosystems, and follow up with Tukey for the pointwise intervals:

```
post <- TukeyHSD(aov(v ~ labels, data = data))
plot(post, ylim = c(0, 4))
```

already adjusted for multiple testing :)