I need to identify duplicates in a dataframe in a specific column.
However, I do not want to eliminate all the duplicate values, but just those showing "http" as initial part of the string in that column.
Normally I would identify the duplicates with the line of code below:
We need another condition with
grep to make sure that only those strings that begin with "http" and is a duplicate will be removed from the dataset.
data[!(grepl("^http", data$var1) & duplicated(data$var1)),]