smci - 1 year ago 73

R Question

I have a tbl_df where I want to group_by(u,v) for each distinct integer combination observed with (u,v).

a) I then want to assign each distinct group some arbitrary distinct number label=1,2,3...

e.g. the combination (u,v)==(2,3) could get label 1, (1,3) could get 2, and so on.

How to do this with one

`mutate()`

dplyr has a neat function

`n()`

`data.table`

`.GRP`

b) Actually what I really want to assign a string/character label ('A','B',...).

But numbering groups by integers is good-enough, because I can then use

`integer_to_label(i)`

`set.seed(1234)`

# Helper fn for mapping integer 1..26 to character label

integer_to_label <- function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) }

df <- tbl_df(data.frame(u=sample.int(3,10,replace=T), v=sample.int(4,10,replace=T)))

# Want to label/number each distinct group of unique (u,v) combinations

df %>% group_by(u,v) %>% mutate(label = n()) # WRONG: n() is number of element within its group, not overall number of group

u v

1 2 3

2 1 3

3 1 2

4 2 3

5 1 2

6 3 3

7 1 3

8 1 2

9 3 1

10 3 4

KLUDGE 1: could do df %>% group_by(u,v) %.% summarize(label = n()) , then self-join

Answer Source

Updated answer

```
get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %.% group_by(u,v) %.% mutate(label = group_number())
```

You can also consider the following slightly unreadable version

```
group_number = (function(){i = 0; function() i <<- i+1 })()
df %.% group_by(u,v) %.% mutate(label = group_number())
```

using `iterators`

package

```
library(iterators)
counter = icount()
df %.% group_by(u,v) %.% mutate(label = nextElem(counter))
```