Maciej B. Maciej B. - 3 years ago 205
R Question

Automate Chi-square across columns

I would like to use Chi-square for testing set of data. How to do it, using loop for or sapply.

This is a set of sample data:

data <- data.frame(v1.1=sample(c('0','1'),n,replace=T),v1.2=sample(c('0','1'),n,replace=T),v1.3=sample(c('0','1'),n,replace=T),v1.4=sample(c('0','1'),n,replace=T),v1.5=sample(c('0','1'),n,replace=T),m1=sample(c('1','2'),n,replace=T))

I would like to test all variables named v1.x with variable m1. That's all.

I want to avoid such a situtation:


I found this topic, but for me and for now it's too difficult.

lmo lmo
Answer Source

You can just use lapply to loop through the variables.

myTests <- lapply(data[-length(data)], function(x) chisq.test(table(x, data$m1)))

This returns a named list, with the changin variable as the name of each list item.

[1] "v1.1" "v1.2" "v1.3" "v1.4" "v1.5"

Then access each with myTests[[1]] or myTests[["v1.1"]]. These return

    Pearson's Chi-squared test with Yates' continuity correction

data:  table(x, data$m1)
X-squared = 0, df = 1, p-value = 1

Then, to pull out components from the individual tests, use names(myTests[[1]] and str(myTests[[1]]) to inspect the contents. myTests[[1]]$p.value, for example, will pull out the p.value from the first test and unlist(sapply(myTests, "[", "p.value")) will return a named vector with p-values from all of the tests.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download