GeorgeSBF GeorgeSBF - 3 months ago 7
R Question

R: Splitting dataset by pre-determined values

I have data that looks like this (but larger):

Pos Value
0 66.81967
1 66.36885
2 65.79508
3 65.27049
4 64.88525
5 64.97541
6 65.39344
7 65.99181
8 66.63115
9 66.95901
10 66.89344
11 66.44262
12 65.90984
13 65.49181
14 65.35246


I have already determined the maxima and saved the position values of each to a vector like so:

9 19 30 42 56 69 80 92 107 118 130 143 154 164 176 188 199 211
222 234 245


I now want to split the data based on the value of the maxima, so for the sample data I'd want to split the dataset into the values for Positions 0->9 and into the values for Positions 10-15, and save each of these sub-sets into vectors of their own.

I'm new to R (and coding) and was wondering how to best go about this.

Answer

Suppose your data frame is dat and your maxima values are in a vector maxima, you might use

split(dat, cut(dat$Pos, breaks = maxima, include.lowest = TRUE))

For your example data frame:

dat <- 
structure(list(Pos = 0:14, Value = c(66.81967, 66.36885, 65.79508, 
65.27049, 64.88525, 64.97541, 65.39344, 65.99181, 66.63115, 66.95901, 
66.89344, 66.44262, 65.90984, 65.49181, 65.35246)), .Names = c("Pos", 
"Value"), class = "data.frame", row.names = c(NA, -15L))

and the first few values of your maxima in the range:

maxima <- c(0, 10, 19)

my code gives you a list of data frames

#$`[0,10]`
#   Pos    Value
#1    0 66.81967
#2    1 66.36885
#3    2 65.79508
#4    3 65.27049
#5    4 64.88525
#6    5 64.97541
#7    6 65.39344
#8    7 65.99181
#9    8 66.63115
#10   9 66.95901
#11  10 66.89344
#
#$`(10,19]`
#   Pos    Value
#12  11 66.44262
#13  12 65.90984
#14  13 65.49181
#15  14 65.35246

If you don't want data frames, but just Value, use

split(dat$Value, cut(dat$Pos, breaks = maxima, include.lowest = TRUE))

#$`[0,10]`
# [1] 66.81967 66.36885 65.79508 65.27049 64.88525 64.97541 65.39344 65.99181
# [9] 66.63115 66.95901 66.89344
#
#$`(10,19]`
# [1] 66.44262 65.90984 65.49181 65.35246

Thanks! How would I go about saving these as separate data frames/sets (not sure on the correct terminology) so that I can then fit them individually?

How about

lst <- split(dat, cut(dat$Pos, breaks = maxima, include.lowest = TRUE))
dir <- getwd()
lapply(seq_len(length(lst)),
       function (i) write.csv(lst[[i]], file = paste0(dir,"/",names(lst[i]), ".csv"), row.names = FALSE))

This will save each data frame into a .csv file under directory dir. I have used getwd() to test the code; you may change it to a specific folder.