Nick Nick - 1 month ago 22
R Question

r - multiple variables in prop.test loop, output data

I am having trouble creating a data frame of

prop.test
results inside of a loop. I am looping two vectors that specify the variables needed to run the multiple
prop.test
s. I am able to print the results, but I would like to put the results into a data frame.

Sample data:

set.seed(1234)
tc <- sample(c("test", "control"), 1000, replace = TRUE, prob = c(.8, .2))
target <- sample(LETTERS[1:2], 1000, replace = TRUE, prob=c(1/3, 1/3, 1/3))
pa <- sample(c(0, 1), 1000, replace = TRUE)
sc <- sample(c(0, 1), 1000, replace = TRUE)
ig <- sample(c(0, 1), 1000, replace = TRUE)
test <- data.frame(tc, target, pa, sc, ig)


Run
prop.test
loop with variables:

#define loop variables
target_var <- c("A", "B") #targets
metric <- c("pa", "sc", "ig") #columns to loop through

#loop through combinations of targets and metrics and run prop.test

for (i in target_var) {
for (j in metric) {
d <- subset(test, target == i)
X <- d[,"tc"]
Y <- d[,j]

print(prop.test(table(X,Y),c(1,0),alternative="two.sided",
conf.level=0.95, correct=FALSE))
}
}


I am unsure on how to write the test results of all prop.test run to a data frame. Specifically, I would need i, j, statistic, parameter, p.value, estimate, conf.int, null.value, alternative, method, data.name for each test run.

Answer

To augment @alistaire's comment: you can call broom::tidy to turn the prop.test output to a dataframe, then wrap the call in a couple of do.call(rbind, lapply(...)) constructs:

library(broom)
out <- do.call(rbind, lapply(c("A", "B"), function(i) {
    do.call(rbind, lapply(c("pa", "sc", "ig"), function(j) {
        d <- subset(test, target == i)
        X <- d[,"tc"]
        Y <- d[,j]
        tidy(prop.test(table(X,Y),c(1,0),alternative="two.sided",
                    conf.level=0.95, correct=FALSE))
    }))
}))

The inner lapply creates a list of length 3 (for "pa", "sc", and "ig"), with each element of the list a dataframe returned by tidy(prop.table(...)), which we then rbind together; the outer lapply creates a list of length 2 (for "A", "B"), with each element a dataframe returned by the inner loop, which we again rbind together.

We can finish off by adding target_var and metric to the dataframe to identify the rows:

out <- cbind(
    setNames(expand.grid(c("pa", "sc", "ig"), c("A", "B")), c("metric", "target_var")),
    out)

Output:

out
#   metric target_var estimate1 estimate2  statistic    p.value parameter ...
# 1     pa          A 0.5142857 0.5169492 0.00153355 0.96876237         1 ...
# 2     sc          A 0.5142857 0.4872881 0.15742455 0.69153883         1 ...
# 3     ig          A 0.4285714 0.4915254 0.85764039 0.35439986         1 ...
# 4     pa          B 0.5000000 0.4629630 0.31977168 0.57174489         1 ...
# 5     sc          B 0.4324324 0.5592593 3.75231435 0.05273445         1 ...
# 6     ig          B 0.5540541 0.4851852 1.10190190 0.29384909         1 ...

If the broom package is unavailable, we can make our own stripped down version of the tidy method for htest objects (like the ones produced by prop.test()):

tidy.proptest <- function(x) {
    ret <- x[c("estimate", "statistic", "p.value", "parameter")]
    names(ret$estimate) <- paste0("estimate", seq_along(ret$estimate))
    ret <- c(ret$estimate, ret)
    ret$estimate <- NULL    
    ret <- c(ret, conf.low = x$conf.int[1], conf.high = x$conf.int[2],
        method = as.character(x$method),
        alternative = as.character(x$alternative))
    data.frame(ret)
}

Replace tidy with tidy.proptest in the above code snippet. Then a couple more steps to prettify the output:

rownames(out) <- seq_len(nrow(out)) # remove row names
out <- cbind(
    setNames(expand.grid(c("pa", "sc", "ig"), c("A", "B")), c("metric", "target_var")),
    out)
Comments