Melissa Melissa - 26 days ago 9
R Question

R: different errors from using apply()

I've been trying to debug this for the past 2 days, applying all the possible fixes I found here on Stack Overflow, but I'm still getting various errors and I don't know what I can do anymore.

dat is a data frame with 3051 rows and 38 columns, taken from the golub dataset in the multtest library .
sample of dat:

> dat[1:5, 1:5]
V1 V2 V3 V4 V5
g1 -1.45769 -1.39420 -1.42779 -1.40715 -1.42668
g2 -0.75161 -1.26278 -0.09052 -0.99596 -1.24245
g3 0.45695 -0.09654 0.90325 -0.07194 0.03232
g4 3.13533 0.21415 2.08754 2.23467 0.93811
g5 2.76569 -1.27045 1.60433 1.53182 1.63728


I have this function defined:

> wilcox.func <- function(x, s1, s2) {
+ x1 <- x[s1]
+ x2 <- x[s2]
+ x1 <- as.numeric(x1)
+ x2 <- as.numeric(x2)
+ w.out <- wilcox.test(x1, x2, exact=F, alternative="two.sided", correct=T)
+ out <- as.numeric(w.out$statistic)
+ return(out) }


and I try to apply it with:

> apply(dat, 1, wilcox.func, s1=c(1:27), s2=c(28:38))


where I want to run the wilcox.test() function with the first 27 columns as x and the remaining columns as y (based off golub.cl). However, I get this error:


Error in wilcox.test(x1, x2, exact = F, alternative = "two.sided", correct = T) :
unused arguments (exact = F, alternative = "two.sided", correct = T)



Removing exact = F, alternative = "two.sided", correct = T gives me a new error Error in x[s1] : only 0's may be mixed with negative subscripts.

Funnily enough at some point I also got the error Error in x[s1, ] : incorrect number of dimensions running the same line of code (with the "unused arguments" not removed from wilcox.test), but that was 2 days ago and I haven't been able to reproduce it again.

I've also tried lapply() and mapply(), but I get the same unused arguments error.

What I'm trying to achieve: the wilcox.test(), if I understand the problem correctly, should be applied to each row where the x vector is composed of columns 1 to 28 and the y vector columns 29 to 38.

I apologize if this is a stupid simple issue I'm missing. I just don't know what it is :(

Edit: this works now (as well as Parfait's code) after restarting R... sorry, that should've probably been something I tried first before posting this...

Answer

Consider sapply() or vapply() (to predefine output type) iterating across row numbers since you need to slice by column ranges for each row. Below uses sample data but adjust to full .dat:

# READ IN SAMPLE dat
data ='
V0       V1       V2       V3       V4       V5
g1 -1.45769 -1.39420 -1.42779 -1.40715 -1.42668
g2 -0.75161 -1.26278 -0.09052 -0.99596 -1.24245
g3  0.45695 -0.09654  0.90325 -0.07194  0.03232
g4  3.13533  0.21415  2.08754  2.23467  0.93811
g5  2.76569 -1.27045  1.60433  1.53182  1.63728'

dat <- read.table(text=data, header=TRUE, stringsAsFactors=FALSE)

# ADJUSTED FUNCTION
wilcox.func <- function(s1, s2) {
 x1 <- as.numeric(s1)
 x2 <- as.numeric(s2)

 w.out <- wilcox.test(x1, x2, exact=F, alternative="two.sided", correct=T)
 out <- as.numeric(w.out$statistic)
 return(out) 
}

output <- sapply(seq_len(nrow(dat)), function(i)
    wilcox.func(dat[i, c(2:4)], dat[i, c(5:6)]))    
output
# [1] 2 4 4 3 3

output <- vapply(seq_len(nrow(dat)), function(i)
    wilcox.func(dat[i, c(2:4)], dat[i, c(5:6)]), 
    numeric(1))    
output
# [1] 2 4 4 3 3
Comments