Pietro Pietro - 3 years ago 140
R Question

Get column mean every block on n rows based on condition

I have this dataframe

r2 distance
1 33.64 67866
2 8.50 77229
3 15.07 109119
4 24.35 142279
5 7.74 143393
6 8.21 177670
7 12.26 216440
8 12.66 253751
9 26.31 282556
10 39.08 320816


I need to calculate the mean of column
r2
for every block of rows where the distance between two values in the column
distance
is equal or less than
100000
.
For this example the desired output would be:

mean_r2 diff_of_distance
1 17.86 75527 ## mean of rows 1 to 5; distance 5 - distance 1
2 13.91 66164 ## mean of rows 2 to 5; distance 5 - distance 2
3 13.84 68551 ## mean of rows 3 to 6; distance 6 - distance 3
4 13.14 74161 ## mean of rows 4 to 7; distance 7 - distance 4
5 9.40 73047 ## mean of rows 5 to 7; distance 7 - distance 5
6 11.04 76081 ## mean of rows 6 to 8; distance 8 - distance 6


and so on.

Edit 1: I have more than 100,000 rows.

Thanks.

Answer Source

Loop through each value of distance, minus this from the values in the distance vector and test if the result is less than 100000. This creates a boolean vector which you sum to identify the index at which the distance is greater than 100000 (i.e. bool becomes FALSE). Use this index to identify your block then take the mean of r2 in each block.

To speed up the code define your vector type and length (to avoid "growing vectors" on each iteration.

means <- vector("numeric", length = nrow(df))
rows <- vector("numeric", length = nrow(df))
distance_diff <- vector("numeric", length = nrow(df))

for (i in seq_along(df$distance)) {

  dis_val <- df$distance[i] # the ith distance value
  bools <- (df$distance - dis_val) < 100000 # bool indicating if difference between i and every value in vector is less than 100000
  block_range <- sum(bools)# taking sum of bools identifies the value at which the distance becomes > 100000
  rows[i] <- paste(as.character(i), "-", as.character(block_range)) 
  means[i] <- mean(df$r2[i:block_range]) # take the mean of r2 in the range i to all rows where distance is < 100000
  distance_diff[i] <- df$distance[block_range] - dis_val # minus the distance from the value before distance is > 100000 from i

}

data.frame(mean_r2 = means, rows= rows, diff_of_distance=distance_diff)

    data.frame(mean_r2 = means, rows= rows, diff_of_distance=distance_diff)
     mean_r2    rows diff_of_distance
1  17.860000   1 - 5            75527
2  13.915000   2 - 5            66164
3  13.842500   3 - 6            68551
4  13.140000   4 - 7            74161
5   9.403333   5 - 7            73047
6  11.043333   6 - 8            76081
7  17.076667   7 - 9            66116
8  26.016667  8 - 10            67065
9  32.695000  9 - 10            38260
10 39.080000 10 - 10                0
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download