Lost Geographer - 5 months ago 21

R Question

**The initial quest**

I wanted to perform a chi-squared test on all columns of a

`data.frame`

`p.value <= 0.05`

`x`

`y`

`y <- 2`

for(y in y:ncol(data)){

chisq_result <- chisq.test(x = data[,1], y = data[,y]);

if(chisq_result$p.value <= 0.05){

print(chisq_result);

}

}

`Pearson's Chi-squared test`

data: data[, 1] and data[, y]

X-squared = 11.166, df = 2, p-value = 0.003761

As you can see, the 2nd line shows

`data[, y]`

`data[, 4]`

`y`

`chisq.test`

Why would I want this ? Because I have several columns in the

`data.frame`

I tried different things with

`get()`

`eval()`

`parse()`

`do.call()`

`cat("X = ", colnames(data)[x], " Y = ", colnames(data)[y], "\n");`

`y <- 2`

for(y in y:ncol(data)){

chisq_result <- chisq.test(x = data[,1], y = data[,y]);

if(chisq_result$p.value <= 0.05){

cat("X = ", colnames(data)[x], " Y = ", colnames(data)[y], "\n");

print(chisq_result);

}

}

... which gives something more usable (see the 1st line), but not satisfying, because I still get the variable name

`y`

`4`

`X = colname1 Y = colname4`

Pearson's Chi-squared test

data: data[, 1] and data[, y]

X-squared = 11.166, df = 2, p-value = 0.003761

Thanks to Roman Luštrik, I used

`sprintf()`

`y <- 2`

for(y in y:ncol(data)){

chisq_result <- chisq.test(x = data[,1], y = data[,y]);

if(chisq_result$p.value <= 0.05){

chisq_result$data.name <- sprintf("col %s and col %s", x, y);

print(chisq_result);

}

}

Which gives:

`Pearson's Chi-squared test`

data: col 5 and col 8

X-squared = 11.166, df = 2, p-value = 0.003761

Answer

I don't see anything wrong with your workaround. Here's another one, by replacing the relevant data in the printed object of `chisq.test`

.

```
xy <- data.frame(var1 = sample(50:100, size = 20),
var2 = sample(100, 150, size = 20))
x <- chisq.test(x = xy[, 1], y = xy[, 2])
x$data.name <- "something pretty from 1 and 2"
x
Pearson's Chi-squared test
data: something pretty from 1 and 2
X-squared = 340, df = 323, p-value = 0.2471
```