KingDan KingDan - 3 months ago 33
R Question

Generate some simple dummy data in R

I just want some random data to experiment with different prediction models.

My code:

x <- 0

for (i in 1:200)
num <- runif(1, 0, 500)
neg <- round(runif(5, -1, 0))
percent <- ((0.01 * runif(1, 1, 10)) * num)

x[i] = num + (neg * percent)

The idea is that this should generate 200 points.

is a random number between 0 and 500

is either -1 or 1, just to add some flexibility to the random offset (negative or positive offset of a randomly generated point)

is just a random percentage between 1% and 10% of the originally generated random number to either be added or subtracted

Very similar code that I've made in my main language, C#, works very well and generates proper plots. I'm more-or-less trying to port that code.

Whenever I run the above, I get the following errors (a lot of them):

number of items to replace is not a multiple of replacement length

It's triggered on the last line of code in the for loop.

I'd love to be able to fix this. Any help is appreciated. Thank you!


Chrisss has already pointed out your problem in his comment. However, you're doing a lot of bad things from an R programming prospective. The following approach is better:

N <- 200

d <- data.frame(x = rep(NA, N))

num <- runif(N, 0, 500)
neg <- sample(c(1,-1), 200, replace = TRUE) #jrdnmdhl pointed this out in his post
percent <- ((0.01 * runif(N, 1, 10)) * num)
d$x <- num + (neg * percent)

Why is this better? Two reasons, we are avoiding a for loop. R is a high-level language, and therefore, loops are slow. Second, you are not preallocating your memory. Skipping this step will slow things down as well. R has to go find more memory for each iteration in your example.

A great resource is Hadley Wickham's Advanced R, to learn more about the first and second reason, read this and that