This is a follow up question to Kikapp's answer.
I want to remove participant IDs which lack all the time-points. In other way around, I want to select rows which have all the four time (11, 21, 31, 41). See the sample data dropbox link
Here is my try based on Kikapp's answer. For some reason, it doesn't work. Let me know how to make it better.
data <- df[df$time %in% names(table(df$time))[table(df$time) > 3],]
data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],]
time == 11
time == 21
time == 41
time == 31
You can use
dplyr for this task for a much faster result:
df1 <- df %>% group_by(ID) %>% dplyr::filter(length(unique(time)) == 4) %>% data.frame()
However, when you get the counts of
ID's for each
time value you will find out that there are
32's hidden there (2 rows in total):
df1 %>% group_by(time) %>% dplyr::summarise(Counts = n()) %>% data.frame() #Output: time Counts 11 4695 21 4695 31 4693 32 2 41 4695
This shows that you have 2 rows with values of
32. As it turns out, that was due to a typo on your part. So you can change them with
df$time[df$time == 32] <- 31 and run the code again.
I hope this was helpful.