Robbert Raats Robbert Raats - 2 months ago 17
R Question

Optimizing preprocessing data frame in R

I have the following data frame with the name

dataValues
:

dates hours
1 2015-10-12 1
5 2015-10-12 5
9 2015-10-12 9
11 2015-10-12 11
14 2015-10-12 14
15 2015-10-12 15
17 2015-10-12 17
19 2015-10-12 19
22 2015-10-12 22
23 2015-10-12 23
24 2015-10-12 24
27 2015-10-13 3
29 2015-10-13 5
33 2015-10-13 9
36 2015-10-13 12
37 2015-10-13 13
38 2015-10-13 14
40 2015-10-13 16
42 2015-10-13 18
44 2015-10-13 20
45 2015-10-13 21
46 2015-10-13 22
47 2015-10-13 23
49 2015-10-14 1
54 2015-10-14 6
56 2015-10-14 8
59 2015-10-14 11
60 2015-10-14 12
61 2015-10-14 13
63 2015-10-14 15
64 2015-10-14 16
66 2015-10-14 18
69 2015-10-14 21
71 2015-10-14 23
72 2015-10-14 24


I have preprocessed this data frame to get all hours on a certain day, which is variable totallist and has output:


[[1]]

[1] 1 5 9 11 14 15 17 19 22 23 24

[[2]]

[1] 3 5 9 12 13 14 16 18 20 21 22 23

[[3]]

[1] 1 6 8 11 12 13 15 16 18 21 23 24


The code I used for this is the following:

uniqueDates <- unique(dataValues$dates)
totallist <- {}
for(date in uniqueDates){
templist <- {}
for(i in 1:length(dataValues$dates)){
if(dataValues$dates[i]==date){
newlist <- append(templist,dataValues$hours[i])
}
}
totallist <- append(totallist,list(templist))
}


For the example in this question (with 3 days) it works fine and the result is what I want, but if I use this on a large dataset (which has about 260 days), it takes about 6 to 7 minutes to finish.

My question is if there is an optimized way to do what I want?

Answer

Try any of these:

# 1
with(unique(dataValues), split(hours, dates))

# 1a - variation of last solution
with(dataValues, lapply(split(hours, dates), unique))

# 2
unstack(unique(dataValues), hours ~ dates)

# 2a - variation of last solution
lapply(unstack(dataValues, hours ~ dates), unique)

Note that if the data values are known to be unique already, as is the case in the sample data shown in the question, then unique(dataValues) in #1 and #2 could be replaced with just dataValues.

Comments