James James - 3 months ago 8
R Question

Remove duplicate column values

I've looked at a few posts, but can't see a clear explanation of solving this problem in R:

df looks as follows:

df>
one two three
1 EC1 EC1 EC2
2 EC2 EC2 EC3
3 EC1 EC1 EC1


I want a new column which contains the unique values within each row as below.
Note four will have varying lengths

one two three four
1 EC1 EC1 EC2 EC1 EC2
2 EC2 EC2 EC3 EC2 EC3
3 EC1 EC1 EC1 EC1


From reading threads, seems like lapply formula is needed. I am also thinking a first stage would be to paste all row values into a single value in new column as a first step.

Answer

We can use apply with MARGIN = 1 to loop over the rows, get the unique elements and paste them together.

df$four <- apply(df, 1, FUN = function(x) paste(unique(x), collapse=" "))

We could also use regex with paste to do this

trimws(gsub("(\\b\\S+\\b)(?=.*\\1)", "", do.call(paste, df), perl = TRUE))
#[1] "EC1 EC2" "EC2 EC3" "EC1"