Alex Alex - 3 months ago 19
R Question

find number of points within a radius in R using lon and lat coordinates

I have a set of points with longitude and latitude coordinates.
I want to find out the number of points within a specific radius from each point.
I have looked at the RANN and FNN packages and the only relevant function I found is the nn2() in the RANN package. though, I do not want to preset the maximum or minimum points that have to be identified (k variable in the nn2 function). in addition, even though I have tried several different values for k(number of points) and radius in nn2 I always get the same results. even when the radius is set to very small or zero.
here's an example of the code I've used

points<- nn2(mydata, k=100, radius = 0.02)


any ideas of how to do this in R?

Answer

To get what you want from nn2, you need to:

  1. Specify the searchtype to be "radius". By not specifying the searchtype, the searchtype is set to standard and the radius input is ignored.
  2. Specify k to be nrow(mydata) because k is the maximum number of nearest neighbors to compute.

The manner in which you are calling nn2 will return the 100 nearest neighbors in mydata for each point in mydata. Consequently, you will get the same result for any radius. For example:

library(RANN)
set.seed(123)

## simulate some data
lon = runif(100, -99.1, -99)
lat = runif(100, 33.9, 34)

## data is a 100 x 2 matrix (can also be data.frame)
mydata <- cbind(lon, lat)

radius <- 0.02   ## your radius
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res$nn.idx>0)))
##[1] 1224
res1 <- nn2(mydata, k=100, radius = radius) 
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1$nn.idx>0)))
##[1] 10000

radius <- 0.03   ## increase radius
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res$nn.idx>0)))
##[1] 2366
res1 <- nn2(mydata, k=100, radius = radius)
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1$nn.idx>0)))
##[1] 10000

Note that according to its documentation ?nn2, nn2 returns a list with two elements:

  1. nn.idx: a nrow(query) x k matrix where each row contains the row indices of the k nearest neighbors in mydata to the point at that row in the collection of query points in query. In both of our calls, query=mydata. When called with searchtype="radius", if there are m < k neighbors within the given radius, then k - m of those index values will be set to 0. Since the set of query points is the same as mydata, the index to the point itself will be included.

  2. nn.dist: a nrow(query) x k matrix where each element contains the Euclidean distances for the corresponding nearest neighbor in nn.idx. Here, if the corresponding element in nn.idx is 0, then the value in nn.dist is set to 1.340781e+154.

With your call, you get 100 nearest neighbors for each point in mydata, hence length(which(res1$nn.idx>0))==10000 no matter what the radius is in the example.

Finally, you should note that because nn2 returns two nrow(mydata) x nrow(mydata) in your case, it can very easily overwhelm your memory if you have a lot of points.


Updated to specifically produce the result of getting the count of neighbors within a given radius.

To compute the number of neighbors within a radius of each point in the data, call nn2 as such

res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)

Then, do this:

count <- rowSums(res$nn.idx > 0) - 1

Notes:

  1. Since query=mydata and k=nrow(mydata) the resulting res$nn.idx will be nrow(mydata) x nrow(mydata) as explained above.
  2. Each row i of res$nn.idx corresponds to row i in query, which is the i-th point in query=mydata. Call this i-th point p[i].
  3. Each row i of res$nn.idx contains the row indices of the neighbors of p[i] AND zeroes to fill that row in the matrix (because not all points in mydata will be within the radius of p[i]).
  4. Therefore, the number of neighbors of p[i] can be found by finding those values in the row that are greater than 0 and then counting them. This is what rowSums(res$nn.idx > 0) does for each row of res$nn.idx.
  5. Lastly, because query=mydata, the point being queried is in the count itself. That is, a point is the nearest neighbor to itself. Therefore subtract 1 from these counts to exclude that.

The resulting count will be a vector of the counts. The i-th element is the number of neighbors within the radius to the i-th point in query=mydata.

Hope this is clear.

Comments