JonMinton JonMinton - 1 year ago 59
R Question

How to plot a contour line showing where 95% of values fall within, in R and in ggplot2

Say we have:

x <- rnorm(1000)
y <- rnorm(1000)

How do I use ggplot2 to produce a plot containing the two following geoms:

  1. The bivariate expectation of the two series of values

  2. A contour line showing where 95% of the estimates fall within?

I know how to do the first part:

df <- data.frame(x=x, y=y)
p <- ggplot(df, aes(x=x, y=y))
p <- p + xlim(-10, 10) + ylim(-10, 10) # say
p <- p + geom_point(x=mean(x), y=mean(y))

And I also know about the stat_contour() and stat_density2d() functions within ggplot2.

And I also know that there are 'bins' options within stat_contour.

However, I guess what I need is something like the probs argument within quantile, but over two dimensions rather than one.

I have also seen a solution within the graphics package. However, I would like to do this within ggplot.

Help much appreciated,


Answer Source

This works, but is quite inefficient because you actually have to compute the kernel density estimate three times.

d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
    kk <- MASS::kde2d(x,y)
    dx <- diff(kk$x[1:2])
    dy <- diff(kk$y[1:2])
    sz <- sort(kk$z)
    c1 <- cumsum(sz) * dx * dy
    approx(c1, sz, xout = 1 - prob)$y
L95 <- getLevel(d$x,d$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
   stat_density2d(geom="tile", aes(fill = ..density..),
                  contour = FALSE)+

(with help from

update: with a recent version of ggplot2 (2.1.0) it doesn't seem possible to pass breaks to stat_density2d (or at least I don't know how), but the method below with geom_contour still seems to work ...

You can make things a little more efficient by computing the kernel density estimate once and plotting the tiles and contours from the same grid:

kk <- with(dd,MASS::kde2d(x,y))
dimnames(kk$z) <- list(kk$x,kk$y)
dc <- melt(kk$z)
  • doing the 95% level computation from the kk grid (to reduce the number of kernel computations to 1) is left as an exercise
  • I'm not sure why stat_density2d(geom="tile") and geom_tile give slightly different results (the former is smoothed)
  • I haven't added the bivariate mean, but something like annotate("point",x=mean(d$x),y=mean(d$y),colour="red") should work.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download