JonMinton JonMinton - 2 months ago 15
R Question

How to plot a contour line showing where 95% of values fall within, in R and in ggplot2

Say we have:

x <- rnorm(1000)
y <- rnorm(1000)


How do I use ggplot2 to produce a plot containing the two following geoms:


  1. The bivariate expectation of the two series of values

  2. A contour line showing where 95% of the estimates fall within?



I know how to do the first part:

df <- data.frame(x=x, y=y)
p <- ggplot(df, aes(x=x, y=y))
p <- p + xlim(-10, 10) + ylim(-10, 10) # say
p <- p + geom_point(x=mean(x), y=mean(y))


And I also know about the stat_contour() and stat_density2d() functions within ggplot2.

And I also know that there are 'bins' options within stat_contour.

However, I guess what I need is something like the probs argument within quantile, but over two dimensions rather than one.

I have also seen a solution within the graphics package. However, I would like to do this within ggplot.

Help much appreciated,

Jon

Answer

This works, but is quite inefficient because you actually have to compute the kernel density estimate three times.

set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
    kk <- MASS::kde2d(x,y)
    dx <- diff(kk$x[1:2])
    dy <- diff(kk$y[1:2])
    sz <- sort(kk$z)
    c1 <- cumsum(sz) * dx * dy
    approx(c1, sz, xout = 1 - prob)$y
}
L95 <- getLevel(d$x,d$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
   stat_density2d(geom="tile", aes(fill = ..density..),
                  contour = FALSE)+
   stat_density2d(colour="red",breaks=L95)

(with help from http://comments.gmane.org/gmane.comp.lang.r.ggplot2/303)

update: with a recent version of ggplot2 (2.1.0) it doesn't seem possible to pass breaks to stat_density2d (or at least I don't know how), but the method below with geom_contour still seems to work ...

You can make things a little more efficient by computing the kernel density estimate once and plotting the tiles and contours from the same grid:

kk <- with(dd,MASS::kde2d(x,y))
library(reshape2)
dimnames(kk$z) <- list(kk$x,kk$y)
dc <- melt(kk$z)
ggplot(dc,aes(x=Var1,y=Var2))+
   geom_tile(aes(fill=value))+
   geom_contour(aes(z=value),breaks=L95,colour="red")
  • doing the 95% level computation from the kk grid (to reduce the number of kernel computations to 1) is left as an exercise
  • I'm not sure why stat_density2d(geom="tile") and geom_tile give slightly different results (the former is smoothed)
  • I haven't added the bivariate mean, but something like annotate("point",x=mean(d$x),y=mean(d$y),colour="red") should work.
Comments