Lyngbakr Lyngbakr -4 years ago 117
R Question

R: Randomly sampling (with replacement) each column of a data frame independently

I am trying to create a new data frame by randomly sampling an existing data frame. Specifically, I want create a data frame that is the same size as the original data frame, but each column of the new data frame is a random sample (with replacement) of the corresponding column in the original data frame. My first attempt looked like this:

# Create toy data set
data.set <- as.data.frame(matrix(1:50, ncol = 5))

# Change names
colnames(data.set) <- c("Stuff", "Things", "Foo", "Bar", "Guff")

# Try to create randomly sampled data frame
data.set %>% sample_n(replace = TRUE, size = nrow(data.set))


The problem here is that it just randomly samples rows, but not elements within each column individually. For example, here is some output.

Stuff Things Foo Bar Guff
2 2 12 22 32 42
10 10 20 30 40 50
2.1 2 12 22 32 42
3 3 13 23 33 43
5 5 15 25 35 45
3.1 3 13 23 33 43
8 8 18 28 38 48
9 9 19 29 39 49
1 1 11 21 31 41
6 6 16 26 36 46


Notice that the first and third rows are exactly the same, as are the fourth and sixth rows. What I would like is for each and every column to be randomly sampled independently. So, I tried this.

apply(data.set, MARGIN = 2, sample_n, replace = TRUE, size = nrow(data.set))


which produced the following error:

Error: Don't know how to sample from objects of class integer


although, I don't see what I did incorrectly. Can anyone offer a concise way of achieving my goal?

Answer Source

First, the apply function should have argument. In this case we use columns since the margin is 2.

apply(df, MARGIN = 2, function(x) sample(x, replace = TRUE, size = length(x)))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download