Gaurav Chaturvedi - 6 months ago 31

R Question

I have 2 lists (

`list1`

`list2`

`list2`

`list1`

I want an approximate locality for every point in list1 as well. So I want to take a point in

`list1`

`list2`

`list1`

`list1`

`list1`

`near_dist`

`indx`

I am using the

`gdist`

Example input lists:

`list1 <- data.frame(longitude = c(80.15998, 72.89125, 77.65032, 77.60599,`

72.88120, 76.65460, 72.88232, 77.49186,

72.82228, 72.88871),

latitude = c(12.90524, 19.08120, 12.97238, 12.90927,

19.08225, 12.81447, 19.08241, 13.00984,

18.99347, 19.07990))

list2 <- data.frame(longitude = c(72.89537, 77.65094, 73.95325, 72.96746,

77.65058, 77.66715, 77.64214, 77.58415,

77.76180, 76.65460),

latitude = c(19.07726, 13.03902, 18.50330, 19.16764,

12.90871, 13.01693, 13.00954, 12.92079,

13.02212, 12.81447),

locality = c("A", "A", "B", "B", "C", "C", "C", "D", "D", "E"))

Answer

To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package `geosphere`

has the `distCosine`

, `distHaversine`

, `distVincentySphere`

and `distVincentyEllipsoid`

for calculating the distance. Of these, the `distVincentyEllipsoid`

is considered the most accurate one, but is computationally more intensive than the other ones.

With one of these functions, you can make a distance matrix. Based on that matrix you can then assign `locality`

names based on shortest distance with `which.min`

and the corresponding distance with `min`

(see for this the last part of the answer) like this:

```
library(geosphere)
# create distance matrix
mat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)
# assign the name to the point in list1 based on shortest distance in the matrix
list1$locality <- list2$locality[apply(mat, 1, which.min)]
```

this gives:

```
> list1
longitude latitude locality
1 80.15998 12.90524 D
2 72.89125 19.08120 A
3 77.65032 12.97238 C
4 77.60599 12.90927 D
5 72.88120 19.08225 A
6 76.65460 12.81447 E
7 72.88232 19.08241 A
8 77.49186 13.00984 D
9 72.82228 18.99347 A
10 72.88871 19.07990 A
```

Another possibility is to assign the `locality`

based on the average longitude and latitude values of the `locality`

s in `list2`

:

```
library(dplyr)
list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()
mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)
list1$locality2 <- list2a$locality[apply(mat2, 1, which.min)]
```

or with `data.table`

:

```
library(data.table)
list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]
mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)
list1$locality2 <- list2a$locality[apply(mat2, 1, which.min)]
```

this gives:

```
> list1
longitude latitude locality locality2
1 80.15998 12.90524 D D
2 72.89125 19.08120 A B
3 77.65032 12.97238 C C
4 77.60599 12.90927 D C
5 72.88120 19.08225 A B
6 76.65460 12.81447 E E
7 72.88232 19.08241 A B
8 77.49186 13.00984 D C
9 72.82228 18.99347 A B
10 72.88871 19.07990 A B
```

As you can see, this leads in most (7 out of 10) occasions to another assigned `locality`

.

You can add the distance with:

```
list1$near_dist <- apply(mat2, 1, min)
```

the result:

```
> list1
longitude latitude locality locality2 near_dist
1: 80.15998 12.90524 D D 269966.8970
2: 72.89125 19.08120 A B 65820.2047
3: 77.65032 12.97238 C C 739.1885
4: 77.60599 12.90927 D C 9209.8165
5: 72.88120 19.08225 A B 66832.7223
6: 76.65460 12.81447 E E 0.0000
7: 72.88232 19.08241 A B 66732.3127
8: 77.49186 13.00984 D C 17855.3083
9: 72.82228 18.99347 A B 69456.3382
10: 72.88871 19.07990 A B 66004.9900
```