Peter Nsanze Peter Nsanze - 4 months ago 10
R Question

Binding dataframes with matching country names

I have two data frames of country data.

df1
has all the countries of the world.
df2
has a subset of countries but has the populations in one of its columns.
I want to take the population data and add it to
df1
where the country names are a match.

If
df1$Column1 = df2$Column1
(same country name) then populate
df1$Column2
(currently empty) with the information from
df2$Column2
(country's population) where the row is the the one for that country match.

I tried to merge the two using the column "Name" which they both have for country names :

total <- merge(map,Co2_2x, by="NAME")


the columns are all there but I get empty rows in my new dataframe.

I'd like to be able to say "for this row and column matrix position in df1 (the country), get the row (country name match in
df2
) and column X (population data). Then put it in this row and column Y matrix position in
df1
(new population column in
df1
for the matched country name)"... There must be an easier way :-)

Here is my code : I'd like to fill
map$measure
with data from
Co2_2x$premium
where the countries match.

library(XML)
library(raster)
library(rgdal)
download.file("http://thematicmapping.org/downloads/TM_WORLD_BORDERS_SIMPL-0.3.zip",destfile="TM_WORLD_BORDERS_SIMPL-0.3.zip")
unzip("TM_WORLD_BORDERS_SIMPL-0.3.zip",exdir=getwd())
polygons <- shapefile("TM_WORLD_BORDERS_SIMPL-0.3.shp")

polygons
map <- as.data.frame(polygons)

map$Measure <- 0

library(rvest)
Co2 <- read_html("https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions")

Co2_2x<-Co2 %>%
html_nodes("table") %>%
.[[1]] %>%
html_table()

names(Co2_2x)[2]<-paste("premium")

names(Co2_2x)[1]<-paste("NAME")


total <- merge(map,Co2_2x, by="NAME")


Thanks!

Answer

To have the first dataset rows with no match in the other dataset appear, you just need to add the all.x=T option, as follows (have a look at the documentation for details) :

total <- merge(map,Co2_2x, by="NAME",all.x=T)

These rows will then appear with NA in the second dataset columns.

If the matching doesn't seem to work, you may want to make sure that your matching variable (in your case, NAME) is filled exaclty the same way in the two datasets (letter case, possible spaces at the extremities...). This answer provides a fine way of doing so.

Comments