NDD NDD - 1 year ago 70
R Question

Randomize data between two columns in R

I have searched for an answer or a solution to this task with no success as of yet, so I do apologize if this is redundant.

I want to randomize the data between two columns. This is to simulate species misidentification in vegetation field data, so I want to assign some sort of probability of misidentification between the two columns as well. I would imagine that there is some way to do this using

or the "permute" package.

I will select some readily available data for an example.

library (vegan)
data (dune)

If you type
head (dune)
, then you can see that this is a data frame with sites as rows and species as columns. For convenience sake, we can presume some field tech has potential to misidentify Poa pratensis and Poa trivialis.

poa = data.frame(Poaprat=dune$Poaprat,Poatriv=dune$Poatriv)
Poaprat Poatriv
1 4 2
2 4 7
3 5 6
4 4 5
5 2 6
6 3 4

What would be the best way to randomize the values between these two columns (transferring between each other and/or adding to one when both are present). The resulting data may look like:

Poaprat Poatriv
1 6 0
2 4 7
3 5 6
4 5 4
5 0 7
6 4 3


For the cringing ecologist out there: please realize, I have made this example in the interest of time and that I know relative cover values are not additive. I apologize for needing to do that.

*** Edit: For more clarity, the type of data being randomized would be percent cover estimates (so values between 0% and 100%). The data in this quick example are relative cover estimates, not counts.

Answer Source

You'll still need to replace the actual columns with the new ones and there may be a more elegant way to do this (it's late in EDT land) and you'll have to decide what else besides the normal distribution you'll want to use (i.e. how you'll replace sample()) but you get your swaps and adds with:



poa <- data.frame(

map2_df(poa$Poaprat, poa$Poatriv, function(x, y) {
  for (i in 1:length(x)) {
    what <- sample(c("left", "right", "swap"), 1)
        x[i] <- x[i] + y[i]
        y[i] <- 0
        y[i] <- x[i] + y[i]
        x[i] <- 0
        tmp <- y[i]
        y[i] <- x[i]
        x[i] <- tmp
  data.frame(Poaprat=x, Poatriv=y)