Alex - 1 year ago 134

R Question

I have a set of points with longitude and latitude coordinates.

I want to find out the number of points within a specific radius from each point.

I have looked at the RANN and FNN packages and the only relevant function I found is the nn2() in the RANN package. though, I do not want to preset the maximum or minimum points that have to be identified (k variable in the nn2 function). in addition, even though I have tried several different values for k(number of points) and radius in nn2 I always get the same results. even when the radius is set to very small or zero.

here's an example of the code I've used

`points<- nn2(mydata, k=100, radius = 0.02)`

any ideas of how to do this in R?

Answer Source

To get what you want from `nn2`

, you need to:

- Specify the
`searchtype`

to be`"radius"`

. By not specifying the`searchtype`

, the`searchtype`

is set to`standard`

and the`radius`

input is ignored. - Specify
`k`

to be`nrow(mydata)`

because`k`

is the maximum number of nearest neighbors to compute.

The manner in which you are calling `nn2`

will return the `100`

nearest neighbors in `mydata`

for each point in `mydata`

. Consequently, you will get the same result for any `radius`

. For example:

```
library(RANN)
set.seed(123)
## simulate some data
lon = runif(100, -99.1, -99)
lat = runif(100, 33.9, 34)
## data is a 100 x 2 matrix (can also be data.frame)
mydata <- cbind(lon, lat)
radius <- 0.02 ## your radius
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res$nn.idx>0)))
##[1] 1224
res1 <- nn2(mydata, k=100, radius = radius)
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1$nn.idx>0)))
##[1] 10000
radius <- 0.03 ## increase radius
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res$nn.idx>0)))
##[1] 2366
res1 <- nn2(mydata, k=100, radius = radius)
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1$nn.idx>0)))
##[1] 10000
```

Note that according to its documentation `?nn2`

, `nn2`

returns a list with two elements:

`nn.idx`

: a`nrow(query) x k`

matrix where each row contains the row indices of the`k`

nearest neighbors in`mydata`

to the point at that row in the collection of query points in`query`

. In both of our calls,`query=mydata`

. When called with`searchtype="radius"`

, if there are`m < k`

neighbors within the given radius, then`k - m`

of those index values will be set to`0`

. Since the set of query points is the same as`mydata`

, the index to the point itself will be included.`nn.dist`

: a`nrow(query) x k`

matrix where each element contains the Euclidean distances for the corresponding nearest neighbor in`nn.idx`

. Here, if the corresponding element in`nn.idx`

is`0`

, then the value in`nn.dist`

is set to 1.340781e+154.

With your call, you get `100`

nearest neighbors for each point in `mydata`

, hence `length(which(res1$nn.idx>0))==10000`

no matter what the `radius`

is in the example.

Finally, you should note that because `nn2`

returns two `nrow(mydata) x nrow(mydata)`

in your case, it can very easily overwhelm your memory if you have a lot of points.

**Updated** to specifically produce the result of getting the count of neighbors within a given radius.

To compute the number of neighbors within a `radius`

of each point in the data, call `nn2`

as such

```
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
```

Then, do this:

```
count <- rowSums(res$nn.idx > 0) - 1
```

Notes:

- Since
`query=mydata`

and`k=nrow(mydata)`

the resulting`res$nn.idx`

will be`nrow(mydata) x nrow(mydata)`

as explained above. - Each row
`i`

of`res$nn.idx`

**corresponds**to row`i`

in`query`

, which is the`i`

-th point in`query=mydata`

. Call this`i`

-th point`p[i]`

. - Each row
`i`

of`res$nn.idx`

**contains**the row indices of the neighbors of`p[i]`

**AND**zeroes to fill that row in the matrix (because not all points in`mydata`

will be within the`radius`

of`p[i]`

). - Therefore, the number of neighbors of
`p[i]`

can be found by finding those values in the row that are greater than`0`

and then counting them. This is what`rowSums(res$nn.idx > 0)`

does for each row of`res$nn.idx`

. - Lastly, because
`query=mydata`

, the point being queried is in the count itself. That is, a point is the nearest neighbor to itself. Therefore subtract`1`

from these counts to exclude that.

The resulting `count`

will be a vector of the counts. The `i`

-th element is the number of neighbors within the `radius`

to the `i`

-th point in `query=mydata`

.

Hope this is clear.