Jerry.Shad Jerry.Shad - 1 month ago 15
R Question

How can I make the output more elegant when I splitting the data.frame in the list?

I have data.frame objects in the list, I intend to split them by its last column. However, I tried to use split function to do this task, and now each data.frame has two subset accordingly. My next attempt is to filter out only sub data.frame from each by its names, while rest of the data.frame can be returned by function, more precisely, to filter out saved data.frame from nested list as returned output. Can anyone help me how to facilitate the this task more comfortable way ? which possible action I might to do? Any way to make the output well represented ? Thanks in advance

quick reproducible example:

dfList <- list(hola= data.frame( start=seq(1, by=4, len=15), to=seq(3, by=4, len=15), value=sample(30, 15)),
boo = data.frame( start=seq(3, by=7, len=20), to=seq(6, by=7, len=20), value=sample(30, 20)),
meh = data.frame( start=seq(4, by=8, len=25), to=seq(7, by=8, len=25), value=sample(30, 25)))


I attempt to implement the function as follow:

splitMe <- function(list, ...) {
# check input
rslt <- lapply(list, function(x) {
out <- split(x, ifelse(x$value >= 10, "save", "discard"))
# intend to filter out discard data.frame and export it as csv file
# How Can I make this happen
# while I intend to only return save data.frame from each as output of splitMe
})
}


Regarding the skeleton of my function, how can I make it complete? How can I get my desired output more efficiently? Any idea is appreciated.

Answer

Here's a function:

splitMe <- function(list, filename, path, threshold...) {
  out <- lapply(list, subset, value>=threshold)
  csv <- lapply(list, subset, value<threshold)
  mapply(function(x, y) write.csv(x, paste0(path, filename, "_", y, ".csv")), csv, c(1:length(list)))
  return(out)
  }

You put in your list and strings in form of "testfile" for filename and "C:/DiscardedData/" as path.

This way your discarded data will be saved as testfile_1.csv, testfile_2.csv and so on.

Edit: Put in a threshold so its more dynamic. Just define your threshold value in the function call.

Another edit: To use the function, just call something like ouput <- splitMe(dfList, filename = "discarded", path = "yourpath", threshold = 10) and insert your path as yourpath.