Eric - 8 months ago 50
R Question

Compute area under density estimation curve, i.e., probability

I have a density estimate (using

`density`
function) for my data
`learningTime`
(see figure below), and I need to find probability
`Pr(learningTime > c)`
, i.e., the the area under density curve from a given number
`c`
(the red vertical line) to the end of curve. Any idea?

It is not a difficult job. Suppose we have some observed data `x` (your `TMESAL\$learningTime`), and as a reproducible example I simply generate 1000 standardized normal random samples:

``````set.seed(0)
x <- rnorm(1000)
``````

Now we perform density estimation, with some customization:

``````d <- density.default(x, n = 512, cut = 3)
str(d)
#    List of 7
# \$ x        : num [1:512] -3.91 -3.9 -3.88 -3.87 -3.85 ...
# \$ y        : num [1:512] 2.23e-05 2.74e-05 3.35e-05 4.07e-05 4.93e-05 ...
# ... truncated ...
``````

We take out `d\$x` and `d\$y`:

``````xx <- d\$x  ## 512 evenly spaced points on [min(x) - 3 * d\$bw, max(x) + 3 * d\$bw]
dx <- xx[2L] - xx[1L]  ## spacing / bin size
yy <- d\$y  ## 512 density values for `xx`
plot(xx, yy, type = "l")  ## plot density curve (or use `plot(d)`)
``````

Integration can be performed by Riemann Sum. For example, the area under the density curve is:

``````C <- sum(yy) * dx  ## sum(yy * dx)
# [1] 1.000976
``````

Since Riemann Sum is only an approximation, this deviates from 1 a little bit. We call this value "normalizing constant".

Now, suppose we want to find area under then curve, from `x0 = 1` to the end of the curve, i.e., numerical integration on `[x0, Inf]`, we can approximate it by

``````p.unscaled <- sum(yy[xx >= x0]) * dx
# [1] 0.1691366
``````

The above is unscaled estimation, we can scale it by `C`:

``````p.scaled <- p.unscaled / C
# [1] 0.1689718
``````

Since the true density of our simulated `x` is know, we can compare this estimate with true value:

``````pnorm(x0, lower.tail = FALSE)
# [1] 0.1586553
``````

which is fairly close.