rvbarreto - 1 year ago 128
R Question

# R How to generate a vector of probabilities normally distributed to be used at chisq.test

I have a vector of 30 samples I want to test the hypothesis of the sample being from a population which is normally distributed.

``````> N.concentration
[1] 0.164 0.045 0.069 0.100 0.050 0.080 0.043 0.036 0.057 0.154 0.133 0.193
[13] 0.129 0.121 0.081 0.178 0.041 0.040 0.116 0.078 0.104 0.095 0.116 0.038
[25] 0.141 0.100 0.104 0.078 0.121 0.104
``````

I made a frequency vector using
`hist`

``````> N.hist <- hist(N.concentration, breaks=10)
> N.freq <- N.hist\$count
[1] 3 5 4 4 5 4 2 2 1
``````

I'm using
`chisq.test`
to check the fitness of
`N.freq`
to a normal distribution, however, the function requires an argument p = a vector of probabilities of the same length of x, as defined in chisq.test documentation. I'm trying to generate a vector to it but, honestly, I don't know exactly what to generate. I'm trying

``````> d <- length(N.freq\$count)%/%2
> p <- dnorm(c(-d:d))
> p
[1] 0.0001338302 0.0044318484 0.0539909665 0.2419707245 0.3989422804
[6] 0.2419707245 0.0539909665 0.0044318484 0.0001338302
> chisq.test(N.freq, p = p)
Error in chisq.test(p1\$count, p = p) :
probabilities must sum to 1.
``````

`rescale.p=TRUE`
but I'm not sure if this will produce a valid test.

EDIT: If I use rescale.p, I got a warning message

``````> chisq.test(N.freq, p=p, rescale.p=TRUE)

Chi-squared test for given probabilities

data:  N.freq
X-squared = 2697.7, df = 8, p-value < 2.2e-16

Warning message:
In chisq.test(N.freq, p = p, rescale.p = TRUE) :
Chi-squared approximation may be incorrect
``````

As I said, to test normality we have to know the mean and standard error of the normal distribution in Null Hypothesis. Since there are no given values, we have to estimate them from your 30 data.

``````x <- c(0.164, 0.045, 0.069, 0.1, 0.05, 0.08, 0.043, 0.036, 0.057,
0.154, 0.133, 0.193, 0.129, 0.121, 0.081, 0.178, 0.041, 0.04,
0.116, 0.078, 0.104, 0.095, 0.116, 0.038, 0.141, 0.1, 0.104,
0.078, 0.121, 0.104)

mu <- mean(x)
sig <- sd(x)
``````

Now, as what you have done, we need to bin the data:

``````h <- hist(x, breaks = 10)
#List of 6
# \$ breaks  : num [1:10] 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
# \$ counts  : int [1:9] 3 5 4 4 5 4 2 2 1
# \$ density : num [1:9] 5 8.33 6.67 6.67 8.33 ...
# \$ mids    : num [1:9] 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19
# \$ xname   : chr "x"
# \$ equidist: logi TRUE
# - attr(*, "class")= chr "histogram"
``````

To get the true probability under Null Hypothesis, we need probability for each bin cell, i.e., between breaks.

``````p <- diff(pnorm(h\$breaks, mu, sig))
#[1] 0.05675523 0.10254734 0.15053351 0.17953337 0.17396679 0.13696059 0.08760419
#[8] 0.04552387 0.01921839
``````

I tend not to trust chi-square test with only 30 data. But here is how we can use `chisq.test`:

``````chisq.test(h\$counts, p = p, rescale.p = TRUE)
#
#   Chi-squared test for given probabilities
#
#data:  h\$counts
#X-squared = 3.1476, df = 8, p-value = 0.9248
#
#Warning message:
#In chisq.test(h\$counts, p, rescale.p = TRUE) :
#  Chi-squared approximation may be incorrect
``````

Often you need not bother the warning message. If you want to get rid of it, set `simulate.p.value = TRUE`:

``````chisq.test(h\$counts, p = p, rescale.p = TRUE, simulate.p.value = TRUE)
#
#   Chi-squared test for given probabilities with simulated p-value (based
#   on 2000 replicates)
#
#data:  h\$counts
#X-squared = 3.1476, df = NA, p-value = 0.942
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download