It is not a difficult job. Suppose we have some observed data
TMESAL$learningTime), and as a reproducible example I simply generate 1000 standardized normal random samples:
set.seed(0) x <- rnorm(1000)
Now we perform density estimation, with some customization:
d <- density.default(x, n = 512, cut = 3) str(d) # List of 7 # $ x : num [1:512] -3.91 -3.9 -3.88 -3.87 -3.85 ... # $ y : num [1:512] 2.23e-05 2.74e-05 3.35e-05 4.07e-05 4.93e-05 ... # ... truncated ...
We take out
xx <- d$x ## 512 evenly spaced points on [min(x) - 3 * d$bw, max(x) + 3 * d$bw] dx <- xx[2L] - xx[1L] ## spacing / bin size yy <- d$y ## 512 density values for `xx` plot(xx, yy, type = "l") ## plot density curve (or use `plot(d)`)
Integration can be performed by Riemann Sum. For example, the area under the density curve is:
C <- sum(yy) * dx ## sum(yy * dx) #  1.000976
Since Riemann Sum is only an approximation, this deviates from 1 a little bit. We call this value "normalizing constant".
Now, suppose we want to find area under then curve, from
x0 = 1 to the end of the curve, i.e., numerical integration on
[x0, Inf], we can approximate it by
p.unscaled <- sum(yy[xx >= x0]) * dx #  0.1691366
The above is unscaled estimation, we can scale it by
p.scaled <- p.unscaled / C #  0.1689718
Since the true density of our simulated
x is know, we can compare this estimate with true value:
pnorm(x0, lower.tail = FALSE) #  0.1586553
which is fairly close.