I just edited my question to make it more general:
"How to scrape a table using r, when the format is not covered in any r functions?"
First of all, how should I know if the format matches what r functions like
I think the general answer to this is "scraping in any language is often a pain in the neck". This is because people put stuff on the web in random, crappy formats that are difficult for machines to parse.
I don't do an enormous amount of scraping, and don't have a better answer than "poke around in the source view of the page, use trial and error".
It looks like the table is badly structured; if you try to extract the
<tr> (table row) you get junk ...
Weblink <- "http://hmofs.northwestern.edu/hc/crystals.php" library(rvest) rr <- read_html(Weblink) tab2 <- html_nodes(rr,"table") ## get 4th table vals <- html_text(html_nodes(tab2,"td")) ## get *all* elements in 4th table
Now take only the numeric values - the 7th column of the table is download information, and gets discarded this way
vals <- suppressWarnings(na.omit(as.numeric(vals))) matrix(vals,byrow=TRUE,ncol=6)