Patrick Williams Patrick Williams - 2 months ago 12
R Question

Binning longitude/latitude labeled data by census block ID

I have two data sets, one for crime in Chicago, labeled with longitude and latitude coords and a shapefile of census blocks also in Chicago. Is it possible in R to aggregate crimes within census blocks, given these two files? The purpose is to be able to map out the crimes by census block.

Location for download of Chicago census tract data:

Location for download of crime data:

Some code that I have pruned down from another project. When it is through there is a spatial object for census tract information and a dataframe containing crime data, including lon/lat coords:


#Helper function to reduce crime data to single year and limit to variables of interest
yearReduce <- function(rawData=NULL,year=NULL) {
datout <- data.frame(year = numeric(0), community = numeric(0), type = numeric(0), arrest = numeric(0),
Latitude = numeric(0), longitude = numeric(0))
dat <- rawData[rawData$Year==year,]
datout <- data.frame(year = dat$Year, community = as.numeric(dat$Community.Area), type = dat$Primary.Type, arrest = dat$Arrest,
latitude = dat$Latitude, longitude = dat$Longitude)

#Load crime data
crimedata <- '~/Documents/data/Crimes_-_2001_to_present.csv'
mydata_crime <- read.csv(crimedata,na.strings = c("", " ", "NA"), stringsAsFactors=F)
mydata_crime$Primary.Type <- tolower(mydata_crime$Primary.Type)

#Set cwd to location of the census tract shape file
#Create spatial vector object and transform projection
tract = readOGR(".","CensusBlockTIGER2010") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))

### Process crime data to narrow to single year###
crime2010 <- yearReduce(mydata_crime,'2010')

# further select specific crime(s). Fairly limited for testing purposes
violent_crimes <- subset(crime2010,
type == "homicide")

violent_crimes <- violent_crimes[complete.cases(violent_crimes),] #Clean data a little bit

Thank you for any help you may be able to provide.


#Load libraries

First, a few improvements to your code above

#Set my wd

#Load crime data
my_crime <- read.csv(file='spat_aggreg/Crimes_2001_to_present.csv',stringsAsFactors=F)`
my_crime$Primary.Type <- tolower(my_crime$Primary.Type)

#Select variables of interest and subset by year and type of crime
#Note, yearReduce() not necessary at all: check R documentation before creating own functions
my_crime <- data.frame(year=my_crime$Year, community=my_crime$Community.Area, 
         type=my_crime$Primary.Type, arrest=my_crime$Arrest, 
         latitude=my_crime$Latitude, longitude=my_crime$Longitude)
vc <- subset(my_crime, year==2010, type=="homicide")

#Keep only complete cases
vc <- vc[complete.cases(vc), ]

#Load census tract data
#Note: function `shapefile` is a neater than `readOGR()`
#Note: The usage of `@` to access attribute data tables associated with spatial objects in R
tract <- shapefile('spat_aggreg/census_blocks_2000/Census_Blocks.shp')
tract <- spTransform(x=tract, CRSobj=CRS("+proj=longlat +datum=WGS84"))
names(tract@data) <- tolower(names(tract@data))

And now, to answer your question...

#Convert crime data to a spatial points object
vc <- SpatialPointsDataFrame(coords=vc[, c("longitude", "latitude")],
          data=vc[, c("year", "community", "type", "arrest")],
          proj4string=CRS("+proj=longlat +datum=WGS84"))

#Each row entry represents one homicide, so, add count column
vc@data$count <- 1

#Spatial overlay to identify census polygon in which each crime point falls
#The Result `vc_tract` is a dataframe with the tract data for each point
vc_tract <- over(x=vc, y=tract)

#Add tract data to crimePoints
vc@data <- data.frame(vc@data, vc_tract)

#Aggregate homicides by tract (assuming the column "census_tra" is the unique ID for census tracts)
hom_tract <- aggregate(formula=count~census_tra, data=vc@data, FUN=length)

#Add number of homicides to tracts object
m <- match(x=tract@data$census_tra, table=hom_tract$census_tra)
tract@data$homs_2010 <- hom_tract$count[m]

Now, your census tracts (in the spatialPolygonDataframe object named tract) contain a column named homs_2010 that has the number of homicides for each tract. From there, it should be a breeze to map it.