MYaseen208 - 5 months ago 50

R Question

I'm trying to do chi square analysis for all combinations of variables in the data and my code is:

`Data <- esoph[ , 1:3]`

OldStatistic <- NA

for(i in 1:(ncol(Data)-1)){

for(j in (i+1):ncol(Data)){

Statistic <- data.frame("Row"=colnames(Data)[i], "Column"=colnames(Data)[j],

"Chi.Square"=round(chisq.test(Data[ ,i], Data[ ,j])$statistic, 3),

"df"=chisq.test(Data[ ,i], Data[ ,j])$parameter,

"p.value"=round(chisq.test(Data[ ,i], Data[ ,j])$p.value, 3),

row.names=NULL)

temp <- rbind(OldStatistic, Statistic)

OldStatistic <- Statistic

Statistic <- temp

}

}

str(Data)

'data.frame': 88 obs. of 3 variables:

$ agegp: Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...

$ alcgp: Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...

$ tobgp: Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...

Statistic

Row Column Chi.Square df p.value

1 agegp tobgp 2.400 15 1

2 alcgp tobgp 0.619 9 1

My code gives my the chi square analysis output for variable 1 vs variable 3, and variable 2 vs variable 3 and is missing for variable 1 vs variable 2. I tried hard but could not fixed the code. Any comment and suggestion will be highly appreciated. I'd like like to do cross tabulation for all possible combinations. Thanks in advance.

I used to do this kind of analysis in SPSS but now I want to switch to R.

Answer

A sample of your data would be appreciated, but I think this will work for you. First, create a combination of all columns with `combn`

. Then write a function to use with an `apply`

function to iterate through the combos. I like to use `plyr`

since it is easy to specify what you want for a data structure on the back end. Also note you only need to compute the chi square test once for each combination of columns, which should speed things up quite a bit as well.

```
library(plyr)
combos <- combn(ncol(Dat),2)
adply(combos, 2, function(x) {
test <- chisq.test(Dat[, x[1]], Dat[, x[2]])
out <- data.frame("Row" = colnames(Dat)[x[1]]
, "Column" = colnames(Dat[x[2]])
, "Chi.Square" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
})
```