Amit Kohli - 1 year ago 53

R Question

I am trying to create a fake dataset for training purposes, and would like a function to create a vector of dates that matches a certain probability distribution... ie - there should be more dates from a certain range selected than another.

I know that to select a range of dates, I can do this:

`seq(as.Date("1940-12-30"), as.Date("2005-01-04"), by="days")`

And to assign to a population, I can do this:

`dchisq(x=1:500,df = 100)`

`rlnorm(500,1,.6)`

But I'm drawing a blank on how to make the

`seq()`

Answer Source

If you can describe what probability you want for each date, you can do this with sample. Here is an example that samples from the days of 2005 using a Gaussian distribution centered at mid-year.

```
Y05 = seq(as.Date("2005-01-01"), as.Date("2005-12-31"), by="days")
Prob = dnorm((1:365)*4/365 - 2)
sample(Y05, 10, replace=TRUE, prob=Prob)
```