Jain Jain - 3 months ago 16
R Question

Ignore case while using duplicated

I am using the duplicated function in R to remove the duplicate rows in my data frame.

df:

Name Rank
A 1
a 1
B 2


df[!duplicated(df),]

Name Rank
A 1
a 1
B 2


The second row is same as the first, but doesn't get deleted just because it takes the case of the "A" and "a" in to consideration. What is the turn around this? Thanks.

Answer
# If it's okay to change the case
df.lower      <- df
df.lower$Name <- tolower(df$Name)

df.lower[!duplicated(df.lower$Name),]

# If you don't want to change the case
df[!duplicated(df.lower$Name),]

or simply

df[!duplicated(tolower(df$Name)),]
  Name Rank
1    A    1
3    B    2

That's for deduping based on Name. For the entire row you could do:

df.lower[!duplicated(df.lower),] # changes the case

or

df[!duplicated(cbind(tolower(df$Name),df$Rank)),] # does not change case