noLongerRandom noLongerRandom - 17 days ago 6
R Question

Using rvest to grab data returns No matches

I'm trying to grab some election results from politco's website using rvest.

http://www.politico.com/2016-election/results/map/president/wisconsin/

I couldn't pull all the data on the page at once, so I went for a county-level approach. Each county has a unique css selector (e.g Adams County's is: '#countyAdams .results-table'). So I grabbed all the county names from elsewhere and set up a quick loop (yes I know loops are bad practice in R but I anticipated this method taking me about 3 minutes).

Grab the URL

wiscoSixteen <- read_html("http://www.politico.com/2016-election/results/map/president/wisconsin")


Create an empty data.frame (and no I didn't pre-define the columns)

stateDf <- NULL


Get the list of counties (this isn't complete but to get to the point the routine breaks we don't need all 70 counties)

wiscoCounties <- c("Adams", "Ashland", "Barron", "Bayfield", "Brown", "Buffalo", "Burnett", "Calumet", "Chippewa", "Clark", "Columbia", "Crawford", "Dane", "Dodge", "Door", "Douglas", "Dunn", "Eau Claire", "Florence", "Fond du Lac", "Forest", "Grant", "Green", "Green Lake", "Iowa", "Iron", "Jackson", "Jefferson", "Juneau")


My 'for' loop:

for (i in 1:length(wiscoCounties)){

#Pull out the i'th county name and paste it in a string
wiscoResult <- wiscoSixteen %>% html_node(paste("#county"," .results-table", sep=wiscoCounties[i])) %>% html_table()

#add a column for the county name so I can ID later
wiscoResult[,4] <- wiscoCounties[i]

#then rbind
stateDf <- rbind(stateDf, wiscoResult)
}


When it gets through the 10th county it stops and returns 'Error: No matches'.

Can't find anything unique about 'Columbia', the 11th county. At a loss for what's happening. I'm sure it's something stupid as that's usually the case. Any help is appreciated.

Answer

So, why not just use the XHR requests that end up populating those tables (I'm kinda surprised you're getting any data at all from them since they get generated from a separate data request):

library(httr)
library(stringi)
library(purrr)
library(dplyr)

res <- GET("http://s3.amazonaws.com/origin-east-elections.politico.com/mapdata/2016/WI_20161108.xml")
dat <- readLines(textConnection(content(res, as="text")))

stri_split_fixed(dat[2], "|")[[1]] %>%
  stri_replace_last_fixed(";", "") %>% 
  stri_split_fixed(";", 3) %>% 
  map_df(~setNames(as.list(.), c("rep_id", "first", "last"))) -> candidates

dat[stri_detect_regex(dat, "^WI;P;G")] %>% 
  stri_replace_first_regex("^WI;P;G;", "") %>% 
  map_df(function(x) {

    county_results <- stri_split_fixed(x, "||", 2)[[1]]

    stri_replace_last_fixed(county_results[1], ";;", "") %>% 
      stri_split_fixed(";") %>% 
      map_df(~setNames(as.list(.), c("fips", "name", "x1", "reporting", "x2", "x3", "x4"))) -> county_prefix

    stri_split_fixed(county_results[2], "|")[[1]] %>% 
      stri_split_fixed(";") %>% 
      map_df(~setNames(as.list(.), c("rep_id", "party", "count", "pct", "x5", "x6", "x7", "x8", "candidate_idx"))) %>% 
      left_join(candidates, by="rep_id") -> df

    df$fips <- county_prefix$fips
    df$name <- county_prefix$name
    df$reporting <- county_prefix$reporting

    select(df, -starts_with("x"))

  }) -> results

It seems to be complete data:

glimpse(results)
## Observations: 511
## Variables: 10
## $ rep_id        <chr> "WI270631108", "WI270621108", "WI270691108", "WI270711108", "WI270701108", "WI270731108", "WI270721108",...
## $ party         <chr> "Dem", "GOP", "Lib", "CST", "ADP", "WW", "Grn", "Dem", "GOP", "Lib", "CST", "ADP", "WW", "Grn", "Dem", "...
## $ count         <chr> "1382210", "1409467", "106442", "12179", "1561", "1781", "30980", "3780", "5983", "207", "44", "4", "9",...
## $ pct           <chr> "46.9", "47.9", "3.6", "0.4", "0.1", "0.1", "1.1", "37.4", "59.2", "2.0", "0.4", "0.0", "0.1", "0.8", "5...
## $ candidate_idx <chr> "1", "2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5", "6", "7",...
## $ first         <chr> "Clinton", "Trump", "Johnson", "Castle", "De La Fuente", "Moorehead", "Stein", "Clinton", "Trump", "John...
## $ last          <chr> "Hillary", "Donald", "Gary", "Darrell", "Rocky", "Monica", "Jill", "Hillary", "Donald", "Gary", "Darrell...
## $ fips          <chr> "0", "0", "0", "0", "0", "0", "0", "55001", "55001", "55001", "55001", "55001", "55001", "55001", "55003...
## $ name          <chr> "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Wisconsin", "Adams", "Ada...
## $ reporting     <chr> "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100.0", "100....

Despite the ".xml" extension on the URL, it's not XML data. I also don't know what some of the columns actually are, but you can dig into that. Also, there's a whole other section of data:

WI;S;G;0;Wisconsin;X;100.0;X;;50885;;||WI269201108;Dem;1380496;46.8;;X;;;1|WI267231108;GOP;1479262;50.2;X;X;X;;2|WI270541108;Lib;87291;3.0;;X;;;3
WI;S;G;55001;Adams;X;100.0;X;;50885;;||WI269201108;Dem;4093;41.2;;X;;;1|WI267231108;GOP;5346;53.9;X;X;X;;2|WI270541108;Lib;486;4.9;;X;;;3
WI;S;G;55003;Ashland;X;100.0;X;;50885;;||WI269201108;Dem;4349;55.1;;X;;;1|WI267231108;GOP;3337;42.2;X;X;X;;2|WI270541108;Lib;214;2.7;;X;;;3
WI;S;G;55005;Barron;X;100.0;X;;50885;;||WI269201108;Dem;8691;38.8;;X;;;1|WI267231108;GOP;12863;57.4;X;X;X;;2|WI270541108;Lib;853;3.8;;X;;;3
WI;S;G;55007;Bayfield;X;100.0;X;;50885;;||WI269201108;Dem;5161;54.6;;X;;;1|WI267231108;GOP;4022;42.6;X;X;X;;2|WI270541108;Lib;263;2.8;;X;;;3
WI;S;G;55009;Brown;X;100.0;X;;50885;;||WI269201108;Dem;51004;40.0;;X;;;1|WI267231108;GOP;71750;56.3;X;X;X;;2|WI270541108;Lib;4615;3.6;;X;;;3
WI;S;G;55011;Buffalo;X;100.0;X;;50885;;||WI269201108;Dem;2746;39.9;;X;;;1|WI267231108;GOP;3850;56.0;X;X;X;;2|WI270541108;Lib;285;4.1;;X;;;3
WI;S;G;55013;Burnett;X;100.0;X;;50885;;||WI269201108;Dem;3143;37.4;;X;;;1|WI267231108;GOP;4998;59.5;X;X;X;;2|WI270541108;Lib;258;3.1;;X;;;3

which obviously means something for that page (it's kinda obvious, but I'm so weary from the election that I'm kinda done with the data) and you can process in similar fashion as what is above.

Comments