Allan Davids Allan Davids - 2 months ago 17
R Question

Checking whether coordinates fall within a given radius

I have a list of co-ordinates of certain bus stops in this format

Bus_Stop_ID lat long
A -34.04199 18.61747
B -33.92312 18.44649


I then have a list of certain shops

Shop_ID lat long
1 -34.039350 18.617964
2 -33.927820 18.410520


I would like to check whether the shops fall within a 500 metre radius from the bus stop. Ultimately, the final dataset would look something like this where the Bus_Stop column indicates T/F and Bus_Stop_ID shows the relevant BUS ID(s) for that shop if Bus_Stop == T -

Shop_ID lat long Bus_Stop Bus_ID
1 -34.039350 18.617964 TRUE A
2 -33.927820 18.410520 FALSE #NA


Does anyone have an idea about how I can go about this using R? I've seen the package
geosphere
but have struggled to understand it given my relative inexperience in the spatial domain. Any ideas or packages you could recommend? Thank you

Answer

You can do this using the package geosphere. Here, I'm assuming that your first data frame is named bus, and your second data frame is named shops:

library(geosphere)
g <- expand.grid(1:nrow(shops), 1:nrow(bus))
d <- matrix(distGeo(shops[g[,1],c("long","lat")], bus[g[,2],c("long","lat")]),
            nrow=nrow(shops))
shops$Bus_Stop <- apply(d, 1, function(x) any(x <= 500))
shops$Bus_ID <- bus[apply(d, 1, function(x) {
                                  c <-which(x <= 500)
                                  if(length(c)==0) NA else c[1]
                                }), "Bus_Stop_ID"]
print(shops)
##  Shop_ID       lat     long Bus_Stop Bus_ID
##1       1 -34.03935 18.61796     TRUE      A
##2       2 -33.92782 18.41052    FALSE   <NA>

Notes:

  1. We first use expand.grid to enumerate all pair combinations of shops and bus stops. These are ordered by shops first.
  2. We then compute the distance matrix d using geosphere::distGeo. Note here that the input expects (lon, lat) coordinates. distGeo returns distances in meters. The resulting d matrix is now(shops) by now(bus) so that each row gives the distance from a shop to each bus stop.
  3. We then see if there is a bus stop within 500 meters of each shop by applying the function any(x <= 500) for each row x in d using apply with MARGIN=1.
  4. Similarly, we can extract the column of d (corresponding to the row in bus) for the first shop that is within 500 meters using which instead of any in our applied function. Then use this result to select the Bus_Stop_ID from bus.

By the way, we don't have to apply the condition x <= 500 twice. The following will also work:

shops$Bus_ID <- bus[apply(d, 1, function(x) {
                                  c <-which(x <= 500)
                                  if(length(c)==0) NA else c[1]
                                }), "Bus_Stop_ID"]
shops$Bus_Stop <- !is.na(shops$Bus_ID)

and is more efficient.

Data:

bus <- structure(list(Bus_Stop_ID = structure(1:2, .Label = c("A", "B"
), class = "factor"), lat = c(-34.04199, -33.92312), long = c(18.61747, 
18.44649)), .Names = c("Bus_Stop_ID", "lat", "long"), class = "data.frame",  row.names = c(NA, 
-2L))

shops <- structure(list(Shop_ID = 1:2, lat = c(-34.03935, -33.92782), 
long = c(18.617964, 18.41052), Bus_ID = structure(c(1L, NA
), .Label = c("A", "B"), class = "factor"), Bus_Stop = c(TRUE, 
FALSE)), .Names = c("Shop_ID", "lat", "long", "Bus_ID", "Bus_Stop"
), row.names = c(NA, -2L), class = "data.frame")
Comments