Amstell Amstell - 3 months ago 8
R Question

Computing ranges within bins

I'm trying to find the sum of each bin given a random vector, but the code is only returning the first element of the vector as 100. How would I cycle through each of the elements in the vector

x
, check if it is range of bin
j
, and return the sum for each bin?

I realize there are functions to do this in
R
, but I'm working on hard coding this specific example.

# Sample data
set.seed(1234)
x <- rnorm(100)


S <- range(x)
a <- range(x)[1]
b <- range(x)[2]
J <- 5 #bins
h <- (b - a)/J #interval

for (j in 1:J){
for (n in 1:length(x)){
ifelse(x[n] > a + (j-1)*h & (x[n] <= a + j*h), n[j] <- n[j] + 1, n[j] <- n[j] + 0)
}
}


Output:

> n
[1] 100 NA NA NA NA


Desired Output:

> n
[1] 7 43 29 13 8

Answer

Why not use cut and table?

set.seed(1234)
x <- rnorm(100)
bin <- cut(x, breaks = 5)    ## evenly cut `range(x)` into 5 bins
levels(bin)
# [1] "(-2.35,-1.37]"  "(-1.37,-0.388]" "(-0.388,0.591]" "(0.591,1.57]"  
# [5] "(1.57,2.55]" 

table(bin)
# (-2.35,-1.37] (-1.37,-0.388] (-0.388,0.591]   (0.591,1.57]    (1.57,2.55] 
#             7             43             29             13              8

Still, I need to show why your loop fails. Note that you don't need an ifelse; ordinary if (...) ... is sufficient. The error is that you used n as loop index, but also use it to record counts! The following corrects this, by using a new vector counts to distinguish with n:

counts <- integer(J)  ## initialization
for (j in 1:J){
  for (n in 1:length(x)) {
    if (x[n] > a + (j-1)*h && x[n] <= a + j*h) counts[j] <- counts[j] + 1L
    }
  }

counts
# [1]  6 43 29 13  7

Perhaps you have noted that the first value is 6 not 7. This is because your loop condition x[n] > a + (j-1)*h && x[n] <= a + j*h does not include the lowest value for the first bin. Since this is always the case, you need manually add a 1 to counts[1].