Amstell - 1 year ago 58
R Question

# Computing ranges within bins

I'm trying to find the sum of each bin given a random vector, but the code is only returning the first element of the vector as 100. How would I cycle through each of the elements in the vector

`x`
, check if it is range of bin
`j`
, and return the sum for each bin?

I realize there are functions to do this in
`R`
, but I'm working on hard coding this specific example.

``````# Sample data
set.seed(1234)
x <- rnorm(100)

S <- range(x)
a <- range(x)[1]
b <- range(x)[2]
J <- 5    #bins
h <- (b - a)/J   #interval

for (j in 1:J){
for (n in 1:length(x)){
ifelse(x[n] > a + (j-1)*h & (x[n] <= a + j*h), n[j] <- n[j] + 1, n[j] <- n[j] + 0)
}
}
``````

Output:

``````> n
[1] 100  NA  NA  NA  NA
``````

Desired Output:

``````> n
[1]  7 43 29 13  8
``````

Why not use `cut` and `table`?

``````set.seed(1234)
x <- rnorm(100)
bin <- cut(x, breaks = 5)    ## evenly cut `range(x)` into 5 bins
levels(bin)
# [1] "(-2.35,-1.37]"  "(-1.37,-0.388]" "(-0.388,0.591]" "(0.591,1.57]"
# [5] "(1.57,2.55]"

table(bin)
# (-2.35,-1.37] (-1.37,-0.388] (-0.388,0.591]   (0.591,1.57]    (1.57,2.55]
#             7             43             29             13              8
``````

Still, I need to show why your loop fails. Note that you don't need an `ifelse`; ordinary `if (...) ...` is sufficient. The error is that you used `n` as loop index, but also use it to record counts! The following corrects this, by using a new vector `counts` to distinguish with `n`:

``````counts <- integer(J)  ## initialization
for (j in 1:J){
for (n in 1:length(x)) {
if (x[n] > a + (j-1)*h && x[n] <= a + j*h) counts[j] <- counts[j] + 1L
}
}

counts
# [1]  6 43 29 13  7
``````

Perhaps you have noted that the first value is `6` not `7`. This is because your loop condition `x[n] > a + (j-1)*h && x[n] <= a + j*h` does not include the lowest value for the first bin. Since this is always the case, you need manually add a `1` to `counts[1]`.

