Andy.Jian Andy.Jian - 10 months ago 41
R Question

How to facilitate the output when splitting data.frame in the list?

I have data.frame objects in the list. However, I intend to split myList by comparing its score with given threshold value. In particular, I want let my function only return data.frame whose score greater than threshold value, meanwhile I export the one with less than threshold value as csv file(because I will further process saved data.frame, while exported data.frame will be listed on summary at the end).I aware that it is easier first split data.frame then export them as csv file, and further process desired one. But I want to make this happen in one wrapper function easily. Can anyone point me how to facilitate the output of my function more efficiently ? Any idea ?

mini example:

mylist <- list(
foo=data.frame( from=seq(1, by=4, len=16), to=seq(3, by=4, len=16), score=sample(30, 16)),
bar=data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), score=sample(30, 20)),
cat=data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), score=sample(30, 25)))

I intend to split them like this:

func <- function(list, threshold=16, ...) {
# input param checking
reslt <- lapply(list, function(elm) {
res <- split(x, c("Droped", "Saved")[(x$score > threshold)+1])
# FIXME : anyway to export Droped instance while return Saved

In my sketch function, I intend to export Droped instance from each data.frame as csv files, while return Saved instance from each data.frame as an output and use this for further process.

I tried to make this happen in my function, but my approach is not efficient here. Can anyone point me out how to accomplish this easily ? Does anyone knows any useful to trick of doing this to prompt my expected output more elegantly ? Thanks in advance.

Answer Source

You could roll both processes into a call to lapply, like this:

# function to perform both tasks on one data frame in mylist
splitter <- function(i, threshold) {


  DF <- mylist[[i]]

  DF %>%
    filter(score <= threshold) %>%
    write.csv(., sprintf("dropped.%s.csv", i), row.names = FALSE)

  Saved <- filter(DF, score > threshold)



# now use the function to create a new list, my list2, with the Saved 
# versions as its elements. the csvs of the dropped rows will get created
# as this runs. 
mylist2 <- lapply(seq_along(mylist), function(i) splitter(i, 16))