Adit Sanghvi - 4 months ago 25

R Question

I'm trying to count the unique values of x across groups y.

This is the function:

`aggregate(x~y,z[which(z$grp==0),],function(x) length(unique(x)))`

This is taking way too long (~6 hours and not done yet). I don't want to stop processing as I have to finish this tonight.

`by()`

Any ideas what is going wrong and how I can reduce the processing time ~ 1 hour?

My dataset has 3 million rows and 16 columns.

Input dataframe z

`x y grp`

1 1 0

2 1 0

1 2 1

1 3 0

3 4 1

I want to get the count of unique (x) for each y where grp = 0

UPDATE: Using @eddi's excellent answer. I have

`x y`

1: 2 1

2: 1 3

3: 3 1

Any idea how I can quickly summarize this as the number of x's for each value y?

So for this it will be

`Number of x y`

5 1

1 3

Answer

Here you go:

```
library(data.table)
setDT(z) # to convert to data.table in place
z[grp == 0, uniqueN(x), by = y]
# y V1
#1: 1 2
#2: 3 1
```