msh855 - 2 months ago 8
R Question

# Finding factors that correspond to more than one values

Suppose, that one has the following dataframe:

``````x=data.frame(c(1,1,2,2,2,3),c("A","A","B","B","B","B"))
names(x)=c("v1","v2")

x
v1 v2
1  1  A
2  1  A
3  2  B
4  2  B
5  2  B
6  3  B
``````

In this dataframe a value in
`v1`
I want to correspond into a label in
`v2`
. However, as one can see in this example
`B`
has more than one corresponding values.

Is there any elegant and fast way to find which labels in
`v2`
correspond to more than one values in
`v1`
?

The result I want ideally to show, the values - which in our example should be
`c(2,3)`
- as well as the row number - which in our example should be
`r=c(5,6)`
.

Assuming that we want the index of the unique elements in 'v1' grouped by 'v2' and that should have more than one unique elements, we create a logical index with `ave` and use that to subset the rows of 'x'.

``````i1 <- with(x, ave(v1, v2, FUN = function(x)
length(unique(x))>1 & !duplicated(x, fromLast=TRUE)))!=0
x[i1,]
#   v1 v2
#5  2  B
#6  3  B
``````

Or a faster option is `data.table`

``````library(data.table)
i1 <- setDT(x)[, .I[uniqueN(v1)>1 & !duplicated(v1, fromLast=TRUE)], v2]\$V1
x[i1, 'v1', with = FALSE][, rn := i1][]
#   v1 rn
#1:  2  5
#2:  3  6
``````