Rocky Rocky - 2 months ago 9
R Question

Select multiple duplicate rows based on specific values in next column

This is a follow up question to Kikapp's answer.

I want to remove participant IDs which lack all the time-points. In other way around, I want to select rows which have all the four time (11, 21, 31, 41). See the sample data dropbox link

Here is my try based on Kikapp's answer. For some reason, it doesn't work. Let me know how to make it better.

data <- df[df$time %in% names(table(df$time))[table(df$time) > 3],]

data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],]


I get 4695 rows or objects or IDs for
time == 11
,
time == 21
,
time == 41
while 4693 for
time == 31
; however, I want they should be equal.

Answer

You can use dplyr for this task for a much faster result:

df1 <- df %>% group_by(ID) %>% 
    dplyr::filter(length(unique(time)) == 4) %>% 
    data.frame()

However, when you get the counts of ID's for each time value you will find out that there are 32's hidden there (2 rows in total):

df1 %>% group_by(time) %>% 
    dplyr::summarise(Counts = n()) %>% 
    data.frame()

#Output:
time Counts
 11   4695  
 21   4695  
 31   4693  
 32      2  
 41   4695 

This shows that you have 2 rows with values of 32. As it turns out, that was due to a typo on your part. So you can change them with df$time[df$time == 32] <- 31 and run the code again.

I hope this was helpful.

Thanks!