Hamid Oskorouchi Hamid Oskorouchi - 1 month ago 11
R Question

R: Identify Duplicates in a column according to condition in the same column

I need to identify duplicates in a dataframe in a specific column.
However, I do not want to eliminate all the duplicate values, but just those showing "http" as initial part of the string in that column.

Normally I would identify the duplicates with the line of code below:

Dup <-data[(duplicated(data[c("var1")])),]


Thanks in advance.

Answer

We need another condition with grep to make sure that only those strings that begin with "http" and is a duplicate will be removed from the dataset.

data[!(grepl("^http", data$var1) & duplicated(data$var1)),] 
Comments