Eric - 1 month ago 21

R Question

Answer

It is not a difficult job. Suppose we have some observed data `x`

(your `TMESAL$learningTime`

), and as a reproducible example I simply generate 1000 standardized normal random samples:

```
set.seed(0)
x <- rnorm(1000)
```

Now we perform density estimation, with some customization:

```
d <- density.default(x, n = 512, cut = 3)
str(d)
# List of 7
# $ x : num [1:512] -3.91 -3.9 -3.88 -3.87 -3.85 ...
# $ y : num [1:512] 2.23e-05 2.74e-05 3.35e-05 4.07e-05 4.93e-05 ...
# ... truncated ...
```

We take out `d$x`

and `d$y`

:

```
xx <- d$x ## 512 evenly spaced points on [min(x) - 3 * d$bw, max(x) + 3 * d$bw]
dx <- xx[2L] - xx[1L] ## spacing / bin size
yy <- d$y ## 512 density values for `xx`
plot(xx, yy, type = "l") ## plot density curve (or use `plot(d)`)
```

Integration can be performed by *Riemann Sum*. For example, the area under the density curve is:

```
C <- sum(yy) * dx ## sum(yy * dx)
# [1] 1.000976
```

Since *Riemann Sum* is only an approximation, this deviates from 1 a little bit. We call this value "normalizing constant".

Now, suppose we want to find area under then curve, from `x0 = 1`

to the end of the curve, i.e., numerical integration on `[x0, Inf]`

, we can approximate it by

```
p.unscaled <- sum(yy[xx >= x0]) * dx
# [1] 0.1691366
```

The above is unscaled estimation, we can scale it by `C`

:

```
p.scaled <- p.unscaled / C
# [1] 0.1689718
```

Since the true density of our simulated `x`

is know, we can compare this estimate with true value:

```
pnorm(x0, lower.tail = FALSE)
# [1] 0.1586553
```

which is fairly close.