code123 code123 - 24 days ago 8
R Question

Form new dataframes from list by combining similar entries in all dataframes

I have a list with many dataframes (example provided below).

G100=structure(list(Return.Period = structure(c(4L, 6L, 2L, 3L, 5L,
1L), .Label = c("100yrs", "10yrs", "20yrs", "2yrs", "50yrs",
"5yrs"), class = "factor"), X95..lower.CI = c(54.3488053692529,
73.33363378538, 84.0868168935697, 91.6191228597281, 96.3360349026068,
95.4278817251266), Estimate = c(61.6857930414643, 84.8210149260708,
101.483909733627, 118.735593472652, 143.33257990536, 163.806035490329
), X95..upper.CI = c(69.0227807136758, 96.3083960667617, 118.881002573685,
145.852064085577, 190.329124908114, 232.18418925553)), .Names = c("Return.Period",
"X95..lower.CI", "Estimate", "X95..upper.CI"), row.names = c(NA,
-6L), class = "data.frame")

G101<-G100 # just for illustration

mylist=list(G100,G101) # there 100 of these with differet codes


names(mylist) represents "SITE". From each dataframe, I would like to take "Estimate" and form a new dataframe which looks like this (not exact because values are not the same for all dfs):
Estimate
<-


SITE X2yrs X5yrs X10yrs X20yrs X50yrs X100yrs
G100 61.68579 84.82101 101.4839 118.7356 143.3326 163.806
G101 61.68579 84.82101 101.4839 118.7356 143.3326 163.806


Note that
SITE
is same as dataframe names in
mylist
.

Do the same for
"X95..lower.CI"
and
"X95..upper.CI"
.

So, I will end up with 3 dataframes
"Estimate"
,
"X95..lower.CI"
and
"X95..upper.CI".
with the above layout.

#lapply, rbindlist,cbind and others can do but how?


Suggestions please.

Answer

Just use a for loop to add the names. There's probably a fancy *apply way, but for is easy to use, remember, and understand.

Start by adding names:

names(mylist) = paste0("G", seq(from = 100, by = 1, length.out = length(mylist)))

Add SITE column as before:

for (i in seq_along(mylist)) {
    mylist[[i]]$SITE = names(mylist)[i]
}

Combine data frames:

Since you have a lot of data frames or they're rather large, use dplyr::rbind_all for speed. (In base R, do.call(rbind, mylist) would work, but be slower.)

library(dplyr)
combined = bind_rows(mylist)

(Older versions of dplyr can use rbind_all instead of bind_rows, but this will soon be deprecated: (https://github.com/hadley/dplyr/issues/803).)

Transform the Estimate and CI columns from long to wide.

This is easy with tidyr, though reshape2::dcast works similarly:

library(tidyr)
Estimate = combined %>% select(SITE, Return.Period, Estimate) %>%
    spread(key = Return.Period, value = Estimate)
head(Estimate)
# Source: local data frame [2 x 7]
#
#   SITE  100yrs    10yrs    20yrs     2yrs    50yrs     5yrs
# 1 G100 163.806 101.4839 118.7356 61.68579 143.3326 84.82101
# 2 G101 163.806 101.4839 118.7356 61.68579 143.3326 84.82101    

Lower95 = combined %>% select(SITE, Return.Period, X95..lower.CI) %>%
    spread(key = Return.Period, value = X95..lower.CI)
head(Lower95)
# Source: local data frame [2 x 7]
#
#   SITE   100yrs    10yrs    20yrs     2yrs    50yrs     5yrs
# 1 G100 95.42788 84.08682 91.61912 54.34881 96.33603 73.33363
# 2 G101 95.42788 84.08682 91.61912 54.34881 96.33603 73.33363

You might want to reorder the columns un-alphabetically.

Do the same for ``"X95..upper.CI"`.

Still left as an exercise for the reader.

Comments