Peter Hickman Peter Hickman - 3 months ago 9
R Question

Gecoding Colloquial Place Names: Zero Results, but Can Get Manual Results (R ggmap)

I'd like to know the latitudes and longitudes of the district offices on the island of Java, Indonesia. Districts are administrative regions, like states in the USA. Most of my geocode queries return inaccurate results: the latitude and longitude are for the district as a whole, not the district office. Yet if I type the query into Google Maps manually, I find what I want.

# list of district names
dists <- read.csv("../javaDistNames.csv")
# vector of queries for Google maps
queries <- paste("Kantor Bupati ", dists$distName, ", ", dists$distName,
", ", dists$provinceName, ", Indonesia", sep="")
# impute latitude and longitude
dists[c("lon", "lat")] <- geocode(queries)

The expression "Kantor Bupati" means District Office in Indonesian.

E.g., if I type "Kantor Bupati BOGOR, BOGOR, JAWA BARAT, Indonesia" into google maps, I find the district office: lat=-6.479745, lon=106.824742. But geocode returns: lat=-6.597147, lon=106.806. That is 20km away: not precise enough for my purposes.


I've solved this: I use the Google Places API as SymbolixAU suggested. The vectorized function below takes as arguments the colloquial place names we want to geocode and a second vector of non-colloquial place names that can be geocoded using ggmap's geocode. It returns latitude, longitude, and the name of the place. Get an API key here.

library("ggmap") # regular geocode function
library("RJSONIO") # read JSON

# API Key for Google Places
key <- # your key here

geoCodeColloquial <- function(queries, bases) {

  # need coordinates of base to focus search
  print("Getting coordinates of bases...")
  baseCoords <- geocode(bases, source="google")

  # request to Google Places
  print("Requesting coordinates of queries...")
  requests <- paste("",
                   baseCoords$lat, ",", baseCoords$lon, 

  # results from Google Places; take only top result for each query
  info <- lapply(requests, 

  # lat and lon
  coords <- lapply(info, function(i) i$geometry$location)

  # name of top result
  geoCodeNames <- lapply(info, function(i) i$name)
  geoCodeNamesDf <- data.frame(matrix(unlist(geoCodeNames),
                                      nrow=length(geoCodeNames), byrow=T))

  # add lat, lon, and discovered names to dataframe
  outDf <- data.frame(matrix(unlist(coords),
                                nrow=length(coords), byrow=T))
  names(outDf) <- c("lat", "lon")
  outDf["geoCodeName"] <- geoCodeNamesDf