Fran Villamil Fran Villamil - 12 days ago 7
R Question

Matching georeferenced data with shape file in R

I have a georeferenced event data set, in a data frame of the form:

LONGITUDE LATITUDE VAR1
33.4 4.4 5
33.4 4.4 3
33.4 4.4 1
30.4 4.2 2
28.4 5.1 2


It counts fatalities in events which are georeferenced. Apart from it, I have a shape file of the provinces in a country, like this:

> str(shapefile)
'data.frame': 216 obs. of 5 variables:
$ CONSTI_COD: num 1 2 3 4 5 6 7 8 9 10 ...
$ Area : num 20 11.7 10.7 223.3 38.7 ...
$ PROVINCE_NAME : Factor w/ 216 levels "CENTRAL","COAST",..: 4 4 4 4 4 4 4 4 2 2 ...
$ Shape_Leng: num 0.193 0.152 0.201 0.872 0.441 ...
$ Shape_Area: num 0.001628 0.000947 0.000867 0.018135 0.003145 ...

..@ polygons :List of 216
.. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
.. .. .. ..@ Polygons :List of 1
.. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
.. .. .. .. .. .. ..@ labpt : num [1:2] 36.9 -1.3
.. .. .. .. .. .. ..@ area : num 0.00163
.. .. .. .. .. .. ..@ hole : logi FALSE
.. .. .. .. .. .. ..@ ringDir: int 1
.. .. .. .. .. .. ..@ coords : num [1:151, 1:2] 36.8 36.8 36.8 36.9 36.9 ...
.. .. .. ..@ plotOrder: int 1
.. .. .. ..@ labpt : num [1:2] 36.9 -1.3
.. .. .. ..@ ID : chr "0"
.. .. .. ..@ area : num 0.00163
[...etc]


What I need to do is to place the event data within the provinces, i.e. add a fourth column to the first data frame which states in which province happened each event, based on the coordinates. So I would have something like this:

LONGITUDE LATITUDE VAR1 PROVINCE
33.4 4.4 5 CENTRAL
33.4 4.4 3 CENTRAL
33.4 4.4 1 CENTRAL
30.4 4.2 2 COAST
28.4 5.1 2 COAST


Is this posible? I think I found some time ago a post that explained how to do this (outside Stack Overflow), but I cannot find it now.

Thanks!

(Sorry if there is a similar question over here. I made a search, but I didn't find an answer, maybe because I don't really know what I'm looking for. I would really appreciate a link to a similar post.)

Answer

What you're talking about is a "spatial join" (or "spatial intersection" or "overlay"). This is pretty straightforward with the help of the over function from the sp package.

Here's an example.

First, let's download and import a polygon shapefile of the world's countries.

download.file(paste0('http://www.naturalearthdata.com/http//',
                     'www.naturalearthdata.com/download/110m/cultural/',
                     'ne_110m_admin_0_countries.zip'), 
              f <- tempfile())
unzip(f, exdir=tempdir())
library(rgdal)
countries <- readOGR(tempdir(), 'ne_110m_admin_0_countries')

Now we'll create some random coordinate data that fall within the extent of the polygon shapefile. We then define the columns x and y as coordinates, and assign the same CRS as that of the polygons (although this may not be the case for your data; be sure to assign correct coordinate systems).

pts <- data.frame(x=runif(10, -180, 180), y=runif(10, -90, 90),
                  VAR1=LETTERS[1:10])
coordinates(pts) <- ~x+y  # pts needs to be a data.frame for this to work
proj4string(pts) <- proj4string(countries)

plot(countries)
points(pts, pch=20, col='red')

shp

Now we can perform the spatial overlay:

over(pts, countries)$admin

#  [1] <NA>      <NA>      Turkey    <NA>     
#  [5] Macedonia <NA>      China     Argentina
#  [9] <NA>      Canada   
# 177 Levels: Afghanistan Albania ... Zimbabwe

Note that in this case, some of the random points fell in the ocean (i.e. outside polygons). When intersected with the polygon object, these points return NA.

Now we cbind the desired attribute to pts:

cbind.data.frame(pts, country=over(pts, countries)$admin)

#             x          y VAR1   country
# 1   -52.59404 -37.422879    A      <NA>
# 2   -33.88867 -40.194482    B      <NA>
# 3    38.84383  37.272460    C    Turkey
# 4   -84.04949   7.118878    D      <NA>
# 5    20.98272  40.920470    E Macedonia
# 6  -155.32951 -37.612497    F      <NA>
# 7    99.40166  38.630049    G     China
# 8   -61.84025 -27.412885    H Argentina
# 9   -37.65287  -3.666080    I      <NA>
# 10 -112.81197  59.959475    J    Canada