rvbarreto - 10 months ago 52

R Question

I have a vector of 30 samples I want to test the hypothesis of the sample being from a population which is normally distributed.

`> N.concentration`

[1] 0.164 0.045 0.069 0.100 0.050 0.080 0.043 0.036 0.057 0.154 0.133 0.193

[13] 0.129 0.121 0.081 0.178 0.041 0.040 0.116 0.078 0.104 0.095 0.116 0.038

[25] 0.141 0.100 0.104 0.078 0.121 0.104

I made a frequency vector using

`hist`

`> N.hist <- hist(N.concentration, breaks=10)`

> N.freq <- N.hist$count

[1] 3 5 4 4 5 4 2 2 1

I'm using

`chisq.test`

`N.freq`

`> d <- length(N.freq$count)%/%2`

> p <- dnorm(c(-d:d))

> p

[1] 0.0001338302 0.0044318484 0.0539909665 0.2419707245 0.3989422804

[6] 0.2419707245 0.0539909665 0.0044318484 0.0001338302

> chisq.test(N.freq, p = p)

Error in chisq.test(p1$count, p = p) :

probabilities must sum to 1.

I thought about using

`rescale.p=TRUE`

EDIT: If I use rescale.p, I got a warning message

`> chisq.test(N.freq, p=p, rescale.p=TRUE)`

Chi-squared test for given probabilities

data: N.freq

X-squared = 2697.7, df = 8, p-value < 2.2e-16

Warning message:

In chisq.test(N.freq, p = p, rescale.p = TRUE) :

Chi-squared approximation may be incorrect

Answer Source

As I said, to test normality we have to know the mean and standard error of the normal distribution in Null Hypothesis. Since there are no given values, we have to estimate them from your 30 data.

```
x <- c(0.164, 0.045, 0.069, 0.1, 0.05, 0.08, 0.043, 0.036, 0.057,
0.154, 0.133, 0.193, 0.129, 0.121, 0.081, 0.178, 0.041, 0.04,
0.116, 0.078, 0.104, 0.095, 0.116, 0.038, 0.141, 0.1, 0.104,
0.078, 0.121, 0.104)
mu <- mean(x)
sig <- sd(x)
```

Now, as what you have done, we need to bin the data:

```
h <- hist(x, breaks = 10)
#List of 6
# $ breaks : num [1:10] 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
# $ counts : int [1:9] 3 5 4 4 5 4 2 2 1
# $ density : num [1:9] 5 8.33 6.67 6.67 8.33 ...
# $ mids : num [1:9] 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19
# $ xname : chr "x"
# $ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"
```

To get the true probability under Null Hypothesis, we need probability for each bin cell, i.e., between breaks.

```
p <- diff(pnorm(h$breaks, mu, sig))
#[1] 0.05675523 0.10254734 0.15053351 0.17953337 0.17396679 0.13696059 0.08760419
#[8] 0.04552387 0.01921839
```

I tend not to trust chi-square test with only 30 data. But here is how we can use `chisq.test`

:

```
chisq.test(h$counts, p = p, rescale.p = TRUE)
#
# Chi-squared test for given probabilities
#
#data: h$counts
#X-squared = 3.1476, df = 8, p-value = 0.9248
#
#Warning message:
#In chisq.test(h$counts, p, rescale.p = TRUE) :
# Chi-squared approximation may be incorrect
```

Often you need not bother the warning message. If you want to get rid of it, set `simulate.p.value = TRUE`

:

```
chisq.test(h$counts, p = p, rescale.p = TRUE, simulate.p.value = TRUE)
#
# Chi-squared test for given probabilities with simulated p-value (based
# on 2000 replicates)
#
#data: h$counts
#X-squared = 3.1476, df = NA, p-value = 0.942
```