Canovice Canovice - 2 months ago 9
R Question

R: remove rows with same elements, but in different columns

Apologies if this is a duplicate question, as it seems like something simple enough that may have been asked already, although a quick search of the question didn't bring up an exact match to my particular issue - if it exists, would appreciate if you shared the question.

Dataframe for reference - I've made the example dataframe by hand, so don't have dput() for now, but could provide it:

.

> head(data[, 1:8], n = 4)
A B C D E F
1 Donald Will Joe Chris Greg Isaiah
2 Donald Will Jeff Chris Greg Isaiah
3 Donald Will Jeff Steve Greg Isaiah
4 Donald Will Jeff Steve Isaiah Greg


.

In this (small example of my larger) dataframe, I need remove any duplicate rows, where a row is considered a duplicate if it has all of the same names as another row, without regard to which columns the names are in. So in this case, row 4 would be considered a duplicate of row 3, and I would want to remove (either) row.

Of note, the order of the columns is very important in my dataframe, and so I cannot simply sort each row alphabetically and then remove exact duplicates.

Thanks for any help!!

Answer
df <- read.table(header=TRUE,stringsAsFactors=FALSE,text="
             A         B        C         D         E         F       
1       Donald      Will      Joe     Chris      Greg     Isaiah  
2       Donald      Will     Jeff     Chris      Greg     Isaiah
3       Donald      Will     Jeff     Steve      Greg     Isaiah
4       Donald      Will     Jeff     Steve    Isaiah       Greg")


df <- df[!duplicated(t(apply(df,1,sort))),]