M. Rasyid Ridha - 1 year ago 77
R Question

# Calculate and summarize total distance in a table using dplyr in R

I have a table consists of user, sequence, and geolocation: x and y

I would like to group it by user and calculate total distance based on the sequence

For example:

``````> df <- data.frame(user_id=rep(1,3), seq=1:3, x=c(1,5,3), y=c(2,3,9))
> df
user_id seq x y
1       1   1 1 2
2       1   2 5 3
3       1   3 3 9
``````

Here is the function to calculate distance between two points (Euclidean):

``````> d <- function(n1,n2){
+   d <- sqrt((df\$y[n2]-df\$y[n1])^2+(df\$x[n2]-df\$x[n1])^2)
+   return(d)
+ }
``````

I would like to get the total distance like this:

``````> df <- data.frame(user_id=1, dtot=d(1,2)+d(2,3))
> df
user_id  dtot
1       1 10.45
``````

How can I use dplyr "group_by" and get total distance based on the sequence for all users?

One way to accomplish what you want is to define a function for computing the total distance:

``````library(dplyr)
total.dist <- function(x,y) {
sum(sqrt((x-lag(x))^2+(y-lag(y))^2),na.rm=TRUE)
}
``````

The inputs to this function are the column vectors `x` and `y`. We compute the distance between each row in vectorized fashion by subtracting with the `lag` of these columns. Then the total distance is the `sum` of all the distances computed, removing `NA`s.

Then using this as a `summarise` function `group_by` `user_id`:

``````res <- df %>% group_by(user_id) %>% summarise(dtot=total.dist(x,y))
### A tibble: 1 x 2
##  user_id     dtot
##    <dbl>    <dbl>
##1       1 10.44766
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download