Kirk Fogg Kirk Fogg - 8 months ago 24
R Question

Get 6x6 table for dataframe containing two variables

I am trying to partition observations in a data frame into 36 groups, based on two variables. More specifically, I am trying to cut each of the two variables into six groups, and then group the observations in one of the 36 different possible groups.

My attempt is below, which works. But is there a faster way to do this that avoids the double for loops?

Also, this isn't necessary, but how could I visualize the total number of observations in each group in a 6 by 6 grid? I know table() would produce a list of the 36 possible groups and their totals, but not in grid format.

set.seed(123)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
data <- data.frame(x1,x2)

labs1 <- levels(cut(x1, 6))
ints1 <- cbind(lower = as.numeric(sub("\\((.+),.*", "\\1", labs1)),
upper = as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", labs1)))
labs2 <- levels(cut(x2, 6))
ints2 <- cbind(lower = as.numeric(sub("\\((.+),.*", "\\1", labs2)),
upper = as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", labs2)))

tmp <- expand.grid(labs1, labs2)
groups <- cbind(lower1 = as.numeric(sub("\\((.+),.*", "\\1", tmp[,1])),
upper1 = as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", tmp[,1])),
lower2 = as.numeric(sub("\\((.+),.*", "\\1", tmp[,2])),
upper2 = as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", tmp[,2])))

for (i in 1:1000){
for (j in 1:36){
if (x1[i] >= groups[j,1] & x1[i] <= groups[j,2] &
x2[i] >= groups[j,3] & x2[i] <= groups[j,4]){
data$group[i] <- j
}
}
}

Answer Source

You can use a mix of apply() that will iterate thru your data.frame and which() that will iterate thru your groups array:

data$group <- apply(data, 1, FUN=function(dataRow) 
  which(
    dataRow[1] >= groups[,1] & 
    dataRow[1] <= groups[,2] & 
    dataRow[2] >= groups[,3] & 
    dataRow[2] <= groups[,4]))