Given a shapefile, how do I shape and use a data file in order to be able to plot thematic maps using identifiers that correspond to shape regions in the shapefile?
#Download English Government Office Network Regions (GOR) from:
tmp_dir = tempdir()
url_data = "http://www.sharegeo.ac.uk/download/10672/50/English%20Government%20Office%20Network%20Regions%20(GOR).zip"
zip_file = sprintf("%s/shpfile.zip", tmp_dir)
unzip(zip_file, exdir = tmp_dir)
#Load in the data file (could this be done from the downloaded zip file directly?
#I can plot the shapefile okay...
#and I can use these commands to get a feel for the data...
# North East North West
# Greater London Authority West Midlands
# Yorkshire and The Humber South West
# East Midlands South East
# East of England
#9 Levels: East Midlands East of England ... Yorkshire and The Humber
#download data from http://www.justice.gov.uk/downloads/publications/statistics-and-data/courts-and-sentencing/csq-q3-2011-insolvency-tables.csv
insolvencygor.2011Q3=subset(insolvency,Time.Period=='2011 Q3' & Geography.Type=='Government office region')
#tidy the data
# "Time.Period" "Geography"
# "Geography.Type" "Company.Winding.up.Petition"
# "Creditors.Petition" "Debtors.Petition"
# "East" "East Midlands"
# "London" "North East"
# "North West" "South East"
# "South West" "Wales"
# "West Midlands" "Yorkshire and the Humber"
#So what next?
Having not seen the wood for the trees, to answer my own question, here's one way (code following on from code in the question):
#Convert factors to numeric [ http://stackoverflow.com/questions/4798343/convert-factor-to-integer ] #There's probably a much better formulaic way of doing this/automating this? insolvencygor.2011Q3$Creditors.Petition=as.numeric(levels(insolvencygor.2011Q3$Creditors.Petition))[insolvencygor.2011Q3$Creditors.Petition] insolvencygor.2011Q3$Company.Winding.up.Petition=as.numeric(levels(insolvencygor.2011Q3$Company.Winding.up.Petition))[insolvencygor.2011Q3$Company.Winding.up.Petition] insolvencygor.2011Q3$Debtors.Petition=as.numeric(levels(insolvencygor.2011Q3$Debtors.Petition))[insolvencygor.2011Q3$Debtors.Petition] #Tweak the levels so they match exactly (really should do this via a lookup table of some sort?) i2=insolvencygor.2011Q3 i2c=c('East of England','East Midlands','Greater London Authority','North East','North West','South East','South West','Wales','West Midlands','Yorkshire and The Humber') i2$Geography=factor(i2$Geography,labels=i2c) #Merge the data with the shapefile gor@data=merge(gor@data,i2,by.x='NAME',by.y='Geography') #Plot the data using a greyscale plot(gor,col=gray(gor@data$Creditors.Petition/max(gor@data$Creditors.Petition)))
So what this approach does is merge the numeric data into the shapefile, and then plot it directly.
That said, wouldn't a cleaner way be to keep the data file and the shapefile separate? (I'm still not sure how to do that?)