DataDancer DataDancer - 8 days ago 7
R Question

How to Make use of a list of tables

Using the XML package I was able to scrape over 80 tables from a website, and this number will grow over time as well. Tables them selves are not very large mostly 6x10 (this size varies between tables and over time too). The redeeming fact is that 99% of the time the tables will have the same columns i.e. column names. for example:

table[1]
A B C D E F
1 b b 2 2 b
2 b b 2 2 b


table[2]
A B C D E F
1 c c 2 2 c
2 c c 2 2 c


how would i go about combining all the tables and their observations into separate variables (each column =variable) while making sure that the observations within each variable maintain their link to the original table (e.g. though an additional variable).

As the different tables refer to the results of different rounds in a competition the end result that i would like to achieve is to be able to track an individuals progression through the competition and for that matter throughout different competitions in any one year (i expect to be scraping a lot of tables).

Any nice R code that anyone can pass on would be great and ideas of best practice for making use of and/or analyzing this mass of information would be invaluable.

Answer

Two things:

1) add an ID column to each of your tables:

tables <- lapply(seq_along(tables), function(i) transform(tables[[i]], ID = i))

2) to bind/align columns that may not have all the same columns, use plyr::rbind.fill:

library(plyr)
all.data <- do.call(rbind.fill, tables)

What you get out is a single data.frame holding all your data. To create "separate variables" like you asked, you could then use attach(all.data) but it is really not recommended. You are better off keeping the data in a data.frame for your analysis.