msh855 - 2 months ago 8

R Question

Suppose, that one has the following dataframe:

`x=data.frame(c(1,1,2,2,2,3),c("A","A","B","B","B","B"))`

names(x)=c("v1","v2")

x

v1 v2

1 1 A

2 1 A

3 2 B

4 2 B

5 2 B

6 3 B

In this dataframe a value in

`v1`

`v2`

`B`

Is there any elegant and fast way to find which labels in

`v2`

`v1`

The result I want ideally to show, the values - which in our example should be

`c(2,3)`

`r=c(5,6)`

Answer

Assuming that we want the index of the unique elements in 'v1' grouped by 'v2' and that should have more than one unique elements, we create a logical index with `ave`

and use that to subset the rows of 'x'.

```
i1 <- with(x, ave(v1, v2, FUN = function(x)
length(unique(x))>1 & !duplicated(x, fromLast=TRUE)))!=0
x[i1,]
# v1 v2
#5 2 B
#6 3 B
```

Or a faster option is `data.table`

```
library(data.table)
i1 <- setDT(x)[, .I[uniqueN(v1)>1 & !duplicated(v1, fromLast=TRUE)], v2]$V1
x[i1, 'v1', with = FALSE][, rn := i1][]
# v1 rn
#1: 2 5
#2: 3 6
```

Source (Stackoverflow)

Comments