Andy.Jian Andy.Jian - 1 year ago 46
R Question

Any way to facilitate the output when manipulating data.frame in list?

I have data.frame objects in the list which are also manipulated in desired way for further process. However, I intend to take its complementary set of each list given condition, also I simply sketch helper function for possibly doing this task. I used setdiff function from dplyr packages, I bet this is the correct way of doing this. But the output turns out wasn't what I expected. I tried one possible approach of using helper function for nested list, but output was not correct if I changed parameter (type=c("Bio","Tech")). Is there any quick approach to get clean, well constructed output that I expected ? How can I make this happen ? Any idea ?

quick reproducible example:

savedList <- list(
foo_saved = data.frame(v1=c(1,6,16), v2=c(4,12,23), nm=c("a1","a2","a3")),
bar_saved = data.frame(v1=c(7,19,31), v2=c(13,28,43), nm=c("b3","b6","b7")),
cat_saved = data.frame(v1=c(5,21,36), v2=c(11,29,42), nm=c("c2","c4","c9"))

dropedList <- list(
foo_droped = data.frame(v1=c(6,25,40), v2=c(12,33,49),nm=c("a2","a5","a8")),
bar_droped = data.frame(v1=c(15,19,47), v2=c(18,28,55),nm=c("b4","b6","b9")),
cat_droped = data.frame(v1=c(13,21,36,53), v2=c(19,29,42,67),nm=c("c3","c4","c9","c12"))

I used this trick to manipulate list:

x <- c(savedList, dropedList)
newList <- split(x, sub("_.*", "", names(x)))[sub("_.*", "", names(savedList))]

This is the helper function that I intend to implement:

func <- function(list, type=c("Bio", "Tech")) {
if(type=="Bio") list[[1]] else setdiff(list[[1]], list[[2]])

I did this way to possibly achieve my output:

res <- Map(func, newList)

but this can't work if I set the type as "Tech", setdiff couldn't return complementary set that I expected. It also bit of difficult to change type and get different output if I use Map. Is there any efficient way to get my desired output ?

Then I want to take complementary set of each list conditionally.

desired output if type is"Bio":

output.Bio <- list(
foo_otp = data.frame(v1=c(1,6,16), v2=c(4,12,23), nm=c("a1","a2","a3")),
bar_otp = data.frame(v1=c(7,19,31), v2=c(13,28,43), nm=c("b3","b6","b7")),
cat_otp = data.frame(v1=c(5,21,36), v2=c(11,29,42), nm=c("c2","c4","c9"))

desired output if type is "Tech" :

output.Tech <- list(
foo_otp = data.frame(v1=c(1,16),v2=c(4,23),nm=c("a1","a3")),
bar_otp = data.frame(v1=c(7,31),v2=c(13,43),nm=c("b3","b7")),
cat_otp = data.frame(v1=c(5),v2=(11),nm="c2")

I can't figure out what went wrong in helper function. Any suggestion to make sure helper function works in safer way ? How can I accomplish this task to get desired output more efficiently? Thanks a lot

Answer Source

You are looking into difference between data.frame, which is what anti_join from dplyr does. This will give you your output.Tech:

Map(anti_join, savedList, dropedList)

#Joining by: c("v1", "v2", "nm")
#Joining by: c("v1", "v2", "nm")
#Joining by: c("v1", "v2", "nm")
#  v1 v2 nm
#1 16 23 a3
#2  1  4 a1

#  v1 v2 nm
#1  7 13 b3
#2 31 43 b7

#  v1 v2 nm
#1  5 11 c2

If you want to incorporate this piece of code in yours, just make a simple function:

func = function(L1, L2, type)
    if(!type %in% c('Bio','Tech')) stop('Wrong type')
    if(type=='Bio') return(L1)
    Map(anti_join, L1, L2)

#func(savedList, dropedList, 'Tech')
#func(savedList, dropedList, 'Bio')