Amit Kohli - 6 months ago 34

R Question

I am trying to create a fake dataset for training purposes, and would like a function to create a vector of dates that matches a certain probability distribution... ie - there should be more dates from a certain range selected than another.

I know that to select a range of dates, I can do this:

`seq(as.Date("1940-12-30"), as.Date("2005-01-04"), by="days")`

And to assign to a population, I can do this:

`dchisq(x=1:500,df = 100)`

`rlnorm(500,1,.6)`

But I'm drawing a blank on how to make the

`seq()`

Answer

If you can describe what probability you want for each date, you can do this with sample. Here is an example that samples from the days of 2005 using a Gaussian distribution centered at mid-year.

```
Y05 = seq(as.Date("2005-01-01"), as.Date("2005-12-31"), by="days")
Prob = dnorm((1:365)*4/365 - 2)
sample(Y05, 10, replace=TRUE, prob=Prob)
```

Source (Stackoverflow)