Agarp Agarp - 1 year ago 123
R Question

R: populating a column of a data frame based on results of simulation

Question continued:
Remove duplicate outcomes, when outcomes are strings and not in the same order

I want to create a data frame with the possible outcomes of rolling two dice. The point of this is to run a simulation separately and populate the data frame with the number of outcomes. I wrote the following code to create the data frame:

# Create variables in data frame
dice1 <- sort(rep(1:6,6))
dice2 <- rep(1:6,6)
dicesum <- dice1 + dice2

# Assign variables to data frame
df <- data.frame(dice1, dice2, dicesum)

# Remove duplicates
inx <- duplicated(t(apply(df, 1, sort)))
df <- df[!inx, ]
rownames(df) <- 1:nrow(df)

# initiate a column that holds the simulation outcome count
df["count"] <- numeric(nrow(df))

> str(df)
'data.frame': 21 obs. of 4 variables:
$ dice1 : int 1 1 1 1 1 1 2 2 2 2 ...
$ dice2 : int 1 2 3 4 5 6 2 3 4 5 ...
$ dicesum: int 2 3 4 5 6 7 4 5 6 7 ...
$ count : num 0 0 0 0 0 0 0 0 0 0 ...

> head(df)
dice1 dice2 dicesum count
1 1 1 2 0
2 1 2 3 0
3 1 3 4 0
4 1 4 5 0
5 1 5 6 0
6 1 6 7 0


# Simulate dice rolls
sim_dice1 <- sample(1:6, 100, replace = T)
sim_dice2 <- sample(1:6, 100, replace = T)

# Data frame with simulations
rolls <- data.frame(sim_dice1, sim_dice2)

> str(rolls)
'data.frame': 100 obs. of 2 variables:
$ sim_dice1: int 2 1 5 2 4 2 1 4 6 1 ...
$ sim_dice2: int 6 5 4 1 4 5 4 5 6 2 ...

> head(rolls)
sim_dice1 sim_dice2
1 2 6
2 1 5
3 5 4
4 2 1
5 4 4
6 2 5


What is the best way to populate the "count" column in df with the outcomes of the simulation? Note that the simulation data frame is has duplicate outcomes - I consider a (1,6) and a (6,1) a duplicate outcome.

ycw ycw
Answer Source

We can use the dplyr package to achieve this task.

library(dplyr)

# Create and count the number of each Group
rolls2 <- rolls %>%
  rowwise() %>%
  mutate(Group = toString(sort(c(sim_dice1, sim_dice2)))) %>%
  ungroup() %>%
  count(Group)

# Create the Group name
df2 <- df %>%
  rowwise() %>%
  mutate(Group = toString(sort(c(dice1, dice2))))

# Perform merge between df2 and rolls2
df3 <- df2 %>%
  left_join(rolls2, by = "Group") %>%
  select(-Group) %>%
  rename(count = n) %>%
  replace(is.na(.), 0)

df3
Source: local data frame [21 x 4]
Groups: <by row>

# A tibble: 21 x 4
   dice1 dice2 dicesum count
   <int> <int>   <int> <dbl>
 1     1     1       2     0
 2     1     2       3     5
 3     1     3       4     5
 4     1     4       5     8
 5     1     5       6     4
 6     1     6       7     5
 7     2     2       4     2
 8     2     3       5     8
 9     2     4       6     7
10     2     5       7     7
# ... with 11 more rows

DATA

# Create variables in data frame
dice1 <- sort(rep(1:6,6))
dice2 <- rep(1:6,6)
dicesum <- dice1 + dice2

# Assign variables to data frame
df <- data.frame(dice1, dice2, dicesum)

# Remove duplicates
inx <- duplicated(t(apply(df, 1, sort)))
df <- df[!inx, ]
rownames(df) <- 1:nrow(df)

# Set seed for the reproducibility
set.seed(123)

# Simulate dice rolls
sim_dice1 <- sample(1:6, 100, replace = T)
sim_dice2 <- sample(1:6, 100, replace = T)

# Data frame with simulations
rolls <- data.frame(sim_dice1, sim_dice2)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download