Alex - 3 months ago 19
R Question

# find number of points within a radius in R using lon and lat coordinates

I have a set of points with longitude and latitude coordinates.
I want to find out the number of points within a specific radius from each point.
I have looked at the RANN and FNN packages and the only relevant function I found is the nn2() in the RANN package. though, I do not want to preset the maximum or minimum points that have to be identified (k variable in the nn2 function). in addition, even though I have tried several different values for k(number of points) and radius in nn2 I always get the same results. even when the radius is set to very small or zero.
here's an example of the code I've used

``````points<- nn2(mydata, k=100, radius = 0.02)
``````

any ideas of how to do this in R?

To get what you want from `nn2`, you need to:

1. Specify the `searchtype` to be `"radius"`. By not specifying the `searchtype`, the `searchtype` is set to `standard` and the `radius` input is ignored.
2. Specify `k` to be `nrow(mydata)` because `k` is the maximum number of nearest neighbors to compute.

The manner in which you are calling `nn2` will return the `100` nearest neighbors in `mydata` for each point in `mydata`. Consequently, you will get the same result for any `radius`. For example:

``````library(RANN)
set.seed(123)

## simulate some data
lon = runif(100, -99.1, -99)
lat = runif(100, 33.9, 34)

## data is a 100 x 2 matrix (can also be data.frame)
mydata <- cbind(lon, lat)

## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res\$nn.idx>0)))
##[1] 1224
res1 <- nn2(mydata, k=100, radius = radius)
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1\$nn.idx>0)))
##[1] 10000

## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res\$nn.idx>0)))
##[1] 2366
res1 <- nn2(mydata, k=100, radius = radius)
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1\$nn.idx>0)))
##[1] 10000
``````

Note that according to its documentation `?nn2`, `nn2` returns a list with two elements:

1. `nn.idx`: a `nrow(query) x k` matrix where each row contains the row indices of the `k` nearest neighbors in `mydata` to the point at that row in the collection of query points in `query`. In both of our calls, `query=mydata`. When called with `searchtype="radius"`, if there are `m < k` neighbors within the given radius, then `k - m` of those index values will be set to `0`. Since the set of query points is the same as `mydata`, the index to the point itself will be included.

2. `nn.dist`: a `nrow(query) x k` matrix where each element contains the Euclidean distances for the corresponding nearest neighbor in `nn.idx`. Here, if the corresponding element in `nn.idx` is `0`, then the value in `nn.dist` is set to 1.340781e+154.

With your call, you get `100` nearest neighbors for each point in `mydata`, hence `length(which(res1\$nn.idx>0))==10000` no matter what the `radius` is in the example.

Finally, you should note that because `nn2` returns two `nrow(mydata) x nrow(mydata)` in your case, it can very easily overwhelm your memory if you have a lot of points.

Updated to specifically produce the result of getting the count of neighbors within a given radius.

To compute the number of neighbors within a `radius` of each point in the data, call `nn2` as such

``````res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
``````

Then, do this:

``````count <- rowSums(res\$nn.idx > 0) - 1
``````

Notes:

1. Since `query=mydata` and `k=nrow(mydata)` the resulting `res\$nn.idx` will be `nrow(mydata) x nrow(mydata)` as explained above.
2. Each row `i` of `res\$nn.idx` corresponds to row `i` in `query`, which is the `i`-th point in `query=mydata`. Call this `i`-th point `p[i]`.
3. Each row `i` of `res\$nn.idx` contains the row indices of the neighbors of `p[i]` AND zeroes to fill that row in the matrix (because not all points in `mydata` will be within the `radius` of `p[i]`).
4. Therefore, the number of neighbors of `p[i]` can be found by finding those values in the row that are greater than `0` and then counting them. This is what `rowSums(res\$nn.idx > 0)` does for each row of `res\$nn.idx`.
5. Lastly, because `query=mydata`, the point being queried is in the count itself. That is, a point is the nearest neighbor to itself. Therefore subtract `1` from these counts to exclude that.

The resulting `count` will be a vector of the counts. The `i`-th element is the number of neighbors within the `radius` to the `i`-th point in `query=mydata`.

Hope this is clear.