Nathan Church - 1 year ago 66
R Question

# In R, evaluate function between two data frames

I am trying to do an evaluation between the values in two data frames and create a new data frame with the results. I am new to the power of R and I am trying to avoid old coding habits. In other words, I am desperately trying to avoid using a loop but can't figure out plyr or the like in this case.

In the sample I have created airports, pilots and a function to get the distance in kilometers. My problem is trying to determine which major airport each pilot is closest to and the distance from each airport.

``````#Build Airports
code <- c("IAH", "DFW", "Denver", "STL")
lat <- c(29.97, 32.90, 39.75, 38.75)
long <- c(95.35, 97.03, 104.87, 90.37)
airports <- data.frame(code, lat, long)

#Build Pilots
names <- c("James", "Fiona", "Seamus")
lat <- c(32.335131, 44.913223, 28.849631)
long <- c(-84.989067, -97.151334, -96.917240)
pilots <- data.frame(names, lat, long)

#Create distance function
distInKm <- function(lat1, long1, lat2, long2) {
dlat = (lat2 * 0.01745329) - (lat1 * 0.01745329) #pi/180 convert to radians
dlong = (long2 * 0.01745329) - (long1 * 0.01745329)
step1 = (sin(dlat / 2)) ^ 2 + cos(lat1 * 0.01745329) * cos(long2 * 0.01745329) * (sin(dlong / 2)) ^ 2
step2 = 2 * atan2(sqrt(step1), sqrt(1 - step1))
dist = 6372.798 * step2 #R is the radius of earth (40041.47 / (2 * pi))
dist
}
``````

Firstly, your airport longitudes are positive when they should be negative, which will throw off results. Let's fix them so results make more sense:

``````airports\$long <- -airports\$long
``````

Now, you can use `apply` to evaluate all pilots for each airport. The `geosphere` package has several functions that calculate straight-line distance, including `distGeo` and `distHaversine`.

``````library(geosphere)

pilots\$closest_airport <- apply(pilots[, 3:2], 1, function(x){
airports[which.min(distGeo(x, airports[, 3:2])), 'code']
})

pilots\$airport_distance <- apply(pilots[, 3:2], 1, function(x){
min(distGeo(x, airports[, 3:2])) / 1000    # /1000 to convert m to km
})

pilots
##    names      lat      long closest_airport airport_distance
## 1  James 32.33513 -84.98907             STL         862.5394
## 2  Fiona 44.91322 -97.15133          Denver         855.8088
## 3 Seamus 28.84963 -96.91724             IAH         196.3559
``````

or if you want all the distances instead of just the minimum one, `cbind` the matrix resulting from `apply`:

``````pilots <- cbind(pilots, t(apply(pilots[, 3:2], 1, function(x){
setNames(distGeo(x, airports[, 3:2]) / 1000, airports\$code)
})))

pilots
##    names      lat      long closest_airport       IAH       DFW    Denver       STL
## 1  James 32.33513 -84.98907             STL 1021.6523 1131.2129 1965.6586  862.5394
## 2  Fiona 44.91322 -97.15133          Denver 1666.0359 1333.6842  855.8088  885.8480
## 3 Seamus 28.84963 -96.91724             IAH  196.3559  449.1838 1412.0664 1253.4874
``````

Translated into `dplyr`, the successor to `plyr`,

``````library(dplyr)

pilots %>% rowwise() %>%
mutate(closest_airport = airports[which.min(distGeo(c(long, lat), airports[, 3:2])), 'code'],
airport_distance = min(distGeo(c(long, lat), airports[, 3:2])) / 1000)

## Source: local data frame [3 x 5]
## Groups: <by row>
##
## # A tibble: 3 × 5
##    names      lat      long closest_airport airport_distance
##   <fctr>    <dbl>     <dbl>          <fctr>            <dbl>
## 1  James 32.33513 -84.98907             STL         862.5394
## 2  Fiona 44.91322 -97.15133          Denver         855.8088
## 3 Seamus 28.84963 -96.91724             IAH         196.3559
``````

or for all the distances, use `bind_cols` with the approach above, or `unnest` a list column and reshape:

``````library(tidyverse)

pilots %>% rowwise() %>%
mutate(closest_airport = airports[which.min(distGeo(c(long, lat), airports[, 3:2])), 'code'],
data = list(data_frame(airport = airports\$code,
distance = distGeo(c(long, lat), airports[, 3:2]) / 1000))) %>%
unnest() %>%

## # A tibble: 3 × 8
##    names      lat      long closest_airport    Denver       DFW       IAH       STL
## * <fctr>    <dbl>     <dbl>          <fctr>     <dbl>     <dbl>     <dbl>     <dbl>
## 1  Fiona 44.91322 -97.15133          Denver  855.8088 1333.6842 1666.0359  885.8480
## 2  James 32.33513 -84.98907             STL 1965.6586 1131.2129 1021.6523  862.5394
## 3 Seamus 28.84963 -96.91724             IAH 1412.0664  449.1838  196.3559 1253.4874
``````

or more directly but less legibly,

``````pilots %>% rowwise() %>%
mutate(closest_airport = airports[which.min(distGeo(c(long, lat), airports[, 3:2])), 'code'],
data = (distGeo(c(long, lat), airports[, 3:2]) / 1000) %>%
setNames(airports\$code) %>% t() %>% as_data_frame() %>% list()) %>%
unnest()

## # A tibble: 3 × 8
##    names      lat      long closest_airport       IAH       DFW    Denver       STL
##   <fctr>    <dbl>     <dbl>          <fctr>     <dbl>     <dbl>     <dbl>     <dbl>
## 1  James 32.33513 -84.98907             STL 1021.6523 1131.2129 1965.6586  862.5394
## 2  Fiona 44.91322 -97.15133          Denver 1666.0359 1333.6842  855.8088  885.8480
## 3 Seamus 28.84963 -96.91724             IAH  196.3559  449.1838 1412.0664 1253.4874
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download