j_simskii - 6 months ago 19

R Question

I have 2 datasets, one of modeled (artificial) data and another with observed data. They have slightly different statistical distributions and I want to force the modeled data to match the observed data distribution in the spread of the data. In other words, I need the modeled data to better represent the tails of the observed data. Here's an example.

`model <- c(37.50,46.79,48.30,46.04,43.40,39.25,38.49,49.51,40.38,36.98,40.00,`

38.49,37.74,47.92,44.53,44.91,44.91,40.00,41.51,47.92,36.98,43.40,

42.26,41.89,38.87,43.02,39.25,40.38,42.64,36.98,44.15,44.91,43.40,

49.81,38.87,40.00,52.45,53.13,47.92,52.45,44.91,29.54,27.13,35.60,

45.34,43.37,54.15,42.77,42.88,44.26,27.14,39.31,24.80,16.62,30.30,

36.39,28.60,28.53,35.84,31.10,34.55,52.65,48.81,43.42,52.49,38.00,

38.65,34.54,37.70,38.11,43.05,29.95,32.48,24.63,35.33,41.34)

observed <- c(39.50,44.79,58.28,56.04,53.40,59.25,48.49,54.51,35.38,39.98,28.00,

28.49,27.74,51.92,42.53,44.91,44.91,40.00,41.51,47.92,36.98,53.40,

42.26,42.89,43.87,43.02,39.25,40.38,42.64,36.98,44.15,44.91,43.40,

52.81,36.87,47.00,52.45,53.13,47.92,52.45,44.91,29.54,27.13,35.60,

51.34,43.37,51.15,42.77,42.88,44.26,27.14,39.31,24.80,12.62,30.30,

34.39,25.60,38.53,35.84,31.10,34.55,52.65,48.81,43.42,52.49,38.00,

34.65,39.54,47.70,38.11,43.05,29.95,22.48,24.63,35.33,41.34)

summary(model)

Min. 1st Qu. Median Mean 3rd Qu. Max.

16.62 36.98 40.38 40.28 44.91 54.15

summary(observed)

Min. 1st Qu. Median Mean 3rd Qu. Max.

12.62 35.54 42.58 41.10 47.76 59.2

How can I force the model data to have the variability that the observed has in R?

Answer

Are you just modeling the distribution of `observed`

? If so, you could generate a kernel density estimate from the observations and then resample from that modeled density distribution. For example:

```
library(ggplot2)
```

Density estimate for observed values. This is our model of the distribution of the observed values:

```
dens.obs = density(observed)
```

Resample from density estimate to get modeled values. We set `prob=dens$y`

so that the probability of a value in `dens$x`

being chosen is proportional to its modeled density.

```
set.seed(439)
resample.obs = sample(dens.obs$x, 1000, replace=TRUE, prob=dens.obs$y)
```

Put observed and modeled values in a data frame in preparation for plotting:

```
dat = data.frame(value=c(observed,resample.obs),
group=rep(c("Observed","Modeled"), c(length(observed),length(resample.obs))))
ggplot(dat, aes(value, fill=group, colour=group)) +
geom_density(alpha=0.4) +
theme_bw()
```