user971102 user971102 - 2 months ago 11
R Question

R count and list unique rows for each column satisfying a condition

I have been going crazy with something basic...

I am trying to count and list in a comma separated column each unique ID coming up in a data frame, e.g.:

df<-data.frame(id = as.character(c("a", "a", "a", "b", "c", "d", "d", "e", "f")), x1=c(3,1,1,1,4,2,3,3,3),
x2=c(6,1,1,1,3,2,3,3,1),
x3=c(1,1,1,1,1,2,3,3,2))

> > df
id x1 x2 x3
1 a 3 6 1
2 a 1 1 1
3 a 1 1 1
4 b 1 1 1
5 c 4 3 1
6 d 1 2 2
7 d 3 3 3
8 e 1 3 3
9 f 3 1 2


I am trying to get a count of unique id that satisfy a condition, >1:

res = data.frame(x1_counts =5, x1_names="a,c,d,e,f", x2_counts = 4, x2_names="a,c,d,f", x3_counts = 3, x3_names="d,e,f")

> res
x1_counts x1_names x2_counts x2_names x3_counts x3_names
1 5 a,c,d,e,f 4 a,c,d,f 3 d,e,f


I have tried with data.table but it seems very convoluted, i.e.
DT = as.data.table(df)
res <- DT[, list(x1= length(unique(id[which(x1>1)])), x2= length(unique(id[which(x2>1)]))), by=id)

But I can't get it right, I am going not getting what I need to do with data.table since it is not really a grouping I am looking for. Can you direct me in the right path please? Thanks so much!

Answer

You can reshape your data to long format and then do the summary:

library(data.table)
(melt(setDT(df), id.vars = "id")[value > 1]
   [, .(counts = uniqueN(id), names = list(unique(id))), variable])
   # You can replace the list to toString if you want a string as name instead of list

#   variable counts     names
#1:       x1      5 a,c,d,e,f
#2:       x2      4   a,c,d,e
#3:       x3      3     d,e,f

To get what you need, reshape it back to wide format:

dcast(1~variable, 
      data = (melt(setDT(df), id.vars = "id")[value > 1]
                 [, .(counts = uniqueN(id), names = list(unique(id))), variable]),  
      value.var = c('counts', 'names'))

#    . counts_x1 counts_x2 counts_x3  names_x1 names_x2 names_x3
# 1: .         5         4         3 a,c,d,e,f  a,c,d,e    d,e,f