Pietro - 3 years ago 140
R Question

# Get column mean every block on n rows based on condition

I have this dataframe

``````       r2 distance
1   33.64    67866
2    8.50    77229
3   15.07   109119
4   24.35   142279
5    7.74   143393
6    8.21   177670
7   12.26   216440
8   12.66   253751
9   26.31   282556
10  39.08   320816
``````

I need to calculate the mean of column
`r2`
for every block of rows where the distance between two values in the column
`distance`
is equal or less than
`100000`
.
For this example the desired output would be:

``````  mean_r2 diff_of_distance
1   17.86            75527 ## mean of rows 1 to 5; distance 5 - distance 1
2   13.91            66164 ## mean of rows 2 to 5; distance 5 - distance 2
3   13.84            68551 ## mean of rows 3 to 6; distance 6 - distance 3
4   13.14            74161 ## mean of rows 4 to 7; distance 7 - distance 4
5    9.40            73047 ## mean of rows 5 to 7; distance 7 - distance 5
6   11.04            76081 ## mean of rows 6 to 8; distance 8 - distance 6
``````

and so on.

Edit 1: I have more than 100,000 rows.

Thanks.

Answer Source

Loop through each value of `distance`, minus this from the values in the `distance` vector and test if the result is less than 100000. This creates a boolean vector which you sum to identify the index at which the distance is greater than 100000 (i.e. bool becomes FALSE). Use this index to identify your block then take the mean of `r2` in each block.

To speed up the code define your vector type and length (to avoid "growing vectors" on each iteration.

``````means <- vector("numeric", length = nrow(df))
rows <- vector("numeric", length = nrow(df))
distance_diff <- vector("numeric", length = nrow(df))

for (i in seq_along(df\$distance)) {

dis_val <- df\$distance[i] # the ith distance value
bools <- (df\$distance - dis_val) < 100000 # bool indicating if difference between i and every value in vector is less than 100000
block_range <- sum(bools)# taking sum of bools identifies the value at which the distance becomes > 100000
rows[i] <- paste(as.character(i), "-", as.character(block_range))
means[i] <- mean(df\$r2[i:block_range]) # take the mean of r2 in the range i to all rows where distance is < 100000
distance_diff[i] <- df\$distance[block_range] - dis_val # minus the distance from the value before distance is > 100000 from i

}

data.frame(mean_r2 = means, rows= rows, diff_of_distance=distance_diff)

data.frame(mean_r2 = means, rows= rows, diff_of_distance=distance_diff)
mean_r2    rows diff_of_distance
1  17.860000   1 - 5            75527
2  13.915000   2 - 5            66164
3  13.842500   3 - 6            68551
4  13.140000   4 - 7            74161
5   9.403333   5 - 7            73047
6  11.043333   6 - 8            76081
7  17.076667   7 - 9            66116
8  26.016667  8 - 10            67065
9  32.695000  9 - 10            38260
10 39.080000 10 - 10                0
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download