tafelplankje tafelplankje - 7 days ago 6
R Question

add duplicate rows to R dataframe based on sequence

My dataframe needs to be expanded

df1<-structure(list(TotalTime = c(0, 15, 16, 23, 24, 29), PhaseName = structure(c(1L,1L, 2L, 2L, 2L, 3L), .Label = c("A", "B","C"), class = "factor")), .Names = c("TotalTime", "Phase"), row.names = c(NA, 6L), class = "data.frame")


df1:

TotalTime Phase
1 0 A
2 15 A
3 16 B
4 23 B
5 24 B
6 29 C


So that it becomes the following dataframe with rows that are duplicated based on TotalTime, however TotalTime should be filled in for every number (second). (I put ... in the example to reduce space, but should be filled with 6,7,8,9-15 etc.) :

TotalTime Phase
1 0 A
2 1 A
3 2 A
4 3 A
5 4 A
6 5 A
..
16 15 A
17 16 B
18 17 B
.. B
24 23 B
25 24 B
26 25 B
27 26 B
28 27 B
29 28 B
30 29 C

Answer

using both packages zoo and dplyr:

library(dplyr)
library(zoo)
data.frame(TotalTime=0:max(df1$TotalTime)) %>% left_join(df1) %>% na.locf

It first creates a data.frame that has the hole sequence from 0 to 29 (here) and merges it with your data. Then I simply do a "last observation carried forward" imputation on the missing values created by the merge.

It can also be done with the library data.table like this: (see also this answer that I adapted:

library(data.table)
df1 = data.table(df1, key="TotalTime")
df2=data.table(TotalTime=0:max(df1$TotalTime))
df1[df2, roll=T]
Comments