Lost Geographer Lost Geographer - 1 month ago 11
R Question

Can't display the value of a y variable in the chisq.test output

The initial quest

I wanted to perform a chi-squared test on all columns of a

data.frame
and then to show only the most significant results (
p.value <= 0.05
). The first column will be
x
and all the others columns will be
y
. Here is the code:

y <- 2

for(y in y:ncol(data)){
chisq_result <- chisq.test(x = data[,1], y = data[,y]);
if(chisq_result$p.value <= 0.05){
print(chisq_result);
}
}


The issue

Pearson's Chi-squared test

data: data[, 1] and data[, y]
X-squared = 11.166, df = 2, p-value = 0.003761


As you can see, the 2nd line shows
data[, y]
and it should be
data[, 4]
(or another column number). In other words, I am not able to display the value of the
y
variable in the
chisq.test
output.

Why would I want this ? Because I have several columns in the
data.frame
which makes several chi-squared tests and without any reference it is sometimes difficult to know which column is related with that output or another.

The temporary workaround

I tried different things with
get()
,
eval()
,
parse()
or
do.call()
but nothing seems to work. For now, I just added
cat("X = ", colnames(data)[x], " Y = ", colnames(data)[y], "\n");
as a conditional task in order to get the name of the variables:

y <- 2

for(y in y:ncol(data)){
chisq_result <- chisq.test(x = data[,1], y = data[,y]);
if(chisq_result$p.value <= 0.05){
cat("X = ", colnames(data)[x], " Y = ", colnames(data)[y], "\n");
print(chisq_result);
}
}


... which gives something more usable (see the 1st line), but not satisfying, because I still get the variable name
y
and not the value
4
(3rd line):

X = colname1 Y = colname4

Pearson's Chi-squared test

data: data[, 1] and data[, y]
X-squared = 11.166, df = 2, p-value = 0.003761


THE SOLUTION

Thanks to Roman Luštrik, I used
sprintf()
in order to edit directly the data.name content. Here is the new code:

y <- 2

for(y in y:ncol(data)){
chisq_result <- chisq.test(x = data[,1], y = data[,y]);
if(chisq_result$p.value <= 0.05){
chisq_result$data.name <- sprintf("col %s and col %s", x, y);
print(chisq_result);
}
}


Which gives:

Pearson's Chi-squared test

data: col 5 and col 8
X-squared = 11.166, df = 2, p-value = 0.003761

Answer

I don't see anything wrong with your workaround. Here's another one, by replacing the relevant data in the printed object of chisq.test.

xy <- data.frame(var1 = sample(50:100, size = 20), 
                 var2 = sample(100, 150, size = 20))

x <- chisq.test(x = xy[, 1], y = xy[, 2])

x$data.name <- "something pretty from 1 and 2"
x

    Pearson's Chi-squared test

data:  something pretty from 1 and 2
X-squared = 340, df = 323, p-value = 0.2471