Osprey Eagle Osprey Eagle - 3 months ago 7
R Question

How to update (assign new values) to R data frames stored in a list

# sample data
options(stringsAsFactors = FALSE)

set.seed(1)
v1 = stringi::stri_rand_strings(4,3)
v2 = rep("",4)
df1 = data.frame(v1, v2)

set.seed(2)
v1 = stringi::stri_rand_strings(4,3)
v2 = rep("",4)
df2 = data.frame(v1, v2)

df.list = list(df1,df2)
df.list

[[1]]
v1 v2
1 GNZ
2 uCt
3 wed
4 3CA

[[2]]
v1 v2
1 BhZ
2 Aww
3 8pT
4 YYE


I want to assign a substring of v1 to v2 for every row of every data frame in a vectorised manner, e.g., v2 = the third character of v1, to get this:

> df.list
[[1]]
v1 v2
1 GNZ Z
2 uCt t
3 wed d
4 3CA A

[[2]]
v1 v2
1 BhZ Z
2 Aww w
3 8pT T
4 YYE E


I know this for-loop works

for (df in 1:2){
df.list[[df]]$v2 = substr(df.list[[df]]$v1, 3, 3)
}
df.list


I know I could use
rbind.fill(df.list)
and then set
$v2 = substr($v1, 3, 3)


I know I could substring before storing the data frame in the list, but I'd rather substring all at once.

I'd like to keep the data in a list b/c the list is indexed by a string that will be used in other code. The rbind.fill does not keep the index / rowname.

I know this does NOT work

sapply(df.list, "[[", "v2") <- sapply(df.list, function(x) substr(x$v1, 3,3))


Even though the right side identifies the correct substrings. I realize the sapply on the left side is an output function and does not point to the target. But this conveys the idea of what I'm trying to do.

This also generates the substring
sapply(df.list, function(x) {x$v2 <- substr(x$v1,3,3)})
but the assignment does not get made.

So how do I point to the same column of every structurally equivalent data frame stored in a list to make the assignment in a vectorized manner?

Answer

Using lapply lets you apply functions easily over each element in a list. Heres a solution using lapply and dplyr's mutate function.

lapply(df.list, function(df) dplyr::mutate(df, v2=substr(v1,3,3)))

Alternate solutions using base R.

lapply(df.list, function(df) data.frame(v1=df$v1, v2=substr(df$v1,3,3)))

lapply(df.list, function(df) {
  df$v2 <- substr(df$v1,3,3)
  return(df)
})
Comments