Adit Sanghvi Adit Sanghvi - 1 month ago 11
R Question

Aggregate in R taking way too long

I'm trying to count the unique values of x across groups y.

This is the function:

aggregate(x~y,z[which(z$grp==0),],function(x) length(unique(x)))


This is taking way too long (~6 hours and not done yet). I don't want to stop processing as I have to finish this tonight.

by()
was taking too long as well

Any ideas what is going wrong and how I can reduce the processing time ~ 1 hour?
My dataset has 3 million rows and 16 columns.

Input dataframe z

x y grp
1 1 0
2 1 0
1 2 1
1 3 0
3 4 1


I want to get the count of unique (x) for each y where grp = 0

UPDATE: Using @eddi's excellent answer. I have

x y
1: 2 1
2: 1 3
3: 3 1


Any idea how I can quickly summarize this as the number of x's for each value y?
So for this it will be

Number of x y
5 1
1 3

Answer

Here you go:

library(data.table)
setDT(z) # to convert to data.table in place

z[grp == 0, uniqueN(x), by = y]
#   y V1
#1: 1  2
#2: 3  1
Comments