Agarp Agarp - 1 year ago 158
R Question

Remove duplicate outcomes, when outcomes are strings and not in the same order

I want to create a data frame with the possible outcomes of rolling two dice. The point of this is to run a simulation separately and populate the data frame with the number of outcomes. I wrote the following code to create the data frame:

dice1 <- sort(rep(1:6,6))
dice2 <- rep(1:6,6)
dicesum <- dice1 + dice2

df <- data.frame(dice1, dice2, dicesum)

> str(df)
'data.frame': 36 obs. of 3 variables:
$ dice1 : int 1 1 1 1 1 1 2 2 2 2 ...
$ dice2 : int 1 2 3 4 5 6 1 2 3 4 ...
$ dicesum: int 2 3 4 5 6 7 3 4 5 6 ...

> head(df)
dice1 dice2 dicesum
1 1 1 2
2 1 2 3
3 1 3 4
4 1 4 5
5 1 5 6
6 1 6 7


I first considered creating pairs, such as (1,6), ... , (6,6), to remove duplicates when (dice1, dice2) == (dice2, dice1). However, the outcome is not desirable because both both instances of the pairs are removed (e.g. (1,6) and (6,1)) and doubles are also removed (e.g. (2,2), (6,6)).

Note: I consider a (1,6) and a (6,1) a duplicate outcome.
Question:What is the best way to remove duplicate outcomes from my data frame?

Answer Source

With the data structure you have after the edit to the question, I believe the following can do it.

inx <- duplicated(t(apply(df, 1, sort)))
df[!inx, ]

The one-liner of the code above is obvious.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download