Charlie  Zhu Charlie Zhu - 1 year ago 52
R Question

How to determine the uniqueness of each column values in its own dynamic range?

Assuming my dataframe has one column, I wish to add another column to indicate if my ith element is unique within the first i elements. The results I want is:

c1 c2

1 1
2 1
3 1
2 0
1 0

For example, 1 is unique in
, 2 is unique in
, 3 is unique in
, 2 is not unique in
, 1 is not unique in

Here is my code, but is runs extremely slow given I have nearly 1 million rows.

for(i in 1:nrow(df)){
k <- sum(df$C1[1:i]==df$C1[i]))

Is there a quicker way of achieving this?

Answer Source

The following works:

x$c2 = as.numeric(! duplicated(x$c1))

Or, if you prefer more explicit code (I do, but it’s slower in this case):

x$c2 = ifelse(duplicated(x$c1), 0, 1)
