NDD - 1 year ago 87
R Question

# Randomize data between two columns in R

I have searched for an answer or a solution to this task with no success as of yet, so I do apologize if this is redundant.

I want to randomize the data between two columns. This is to simulate species misidentification in vegetation field data, so I want to assign some sort of probability of misidentification between the two columns as well. I would imagine that there is some way to do this using

`sample`
or the "permute" package.

I will select some readily available data for an example.

``````library (vegan)
data (dune)
``````

If you type
`head (dune)`
, then you can see that this is a data frame with sites as rows and species as columns. For convenience sake, we can presume some field tech has potential to misidentify Poa pratensis and Poa trivialis.

``````poa = data.frame(Poaprat=dune\$Poaprat,Poatriv=dune\$Poatriv)
Poaprat      Poatriv
1             4            2
2             4            7
3             5            6
4             4            5
5             2            6
6             3            4
``````

What would be the best way to randomize the values between these two columns (transferring between each other and/or adding to one when both are present). The resulting data may look like:

``````           Poaprat      Poatriv
1             6            0
2             4            7
3             5            6
4             5            4
5             0            7
6             4            3
``````

P.S.

For the cringing ecologist out there: please realize, I have made this example in the interest of time and that I know relative cover values are not additive. I apologize for needing to do that.

*** Edit: For more clarity, the type of data being randomized would be percent cover estimates (so values between 0% and 100%). The data in this quick example are relative cover estimates, not counts.

You'll still need to replace the actual columns with the new ones and there may be a more elegant way to do this (it's late in EDT land) and you'll have to decide what else besides the normal distribution you'll want to use (i.e. how you'll replace `sample()`) but you get your swaps and adds with:

``````library(vegan)
library(purrr)

data(dune)

poa <- data.frame(
Poaprat=dune\$Poaprat,
Poatriv=dune\$Poatriv
)

map2_df(poa\$Poaprat, poa\$Poatriv, function(x, y) {
for (i in 1:length(x)) {
what <- sample(c("left", "right", "swap"), 1)
switch(
what,
left={
x[i] <- x[i] + y[i]
y[i] <- 0
},
right={
y[i] <- x[i] + y[i]
x[i] <- 0
},
swap={
tmp <- y[i]
y[i] <- x[i]
x[i] <- tmp
}
)
}
data.frame(Poaprat=x, Poatriv=y)
})
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download