JonMinton - 2 months ago 15
R Question

# How to plot a contour line showing where 95% of values fall within, in R and in ggplot2

Say we have:

x <- rnorm(1000)
y <- rnorm(1000)

How do I use ggplot2 to produce a plot containing the two following geoms:

1. The bivariate expectation of the two series of values

2. A contour line showing where 95% of the estimates fall within?

I know how to do the first part:

df <- data.frame(x=x, y=y)
p <- ggplot(df, aes(x=x, y=y))
p <- p + xlim(-10, 10) + ylim(-10, 10) # say
p <- p + geom_point(x=mean(x), y=mean(y))

And I also know about the stat_contour() and stat_density2d() functions within ggplot2.

And I also know that there are 'bins' options within stat_contour.

However, I guess what I need is something like the probs argument within quantile, but over two dimensions rather than one.

I have also seen a solution within the graphics package. However, I would like to do this within ggplot.

Help much appreciated,

Jon

This works, but is quite inefficient because you actually have to compute the kernel density estimate three times.

set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
kk <- MASS::kde2d(x,y)
dx <- diff(kk\$x[1:2])
dy <- diff(kk\$y[1:2])
sz <- sort(kk\$z)
c1 <- cumsum(sz) * dx * dy
approx(c1, sz, xout = 1 - prob)\$y
}
L95 <- getLevel(d\$x,d\$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
stat_density2d(geom="tile", aes(fill = ..density..),
contour = FALSE)+
stat_density2d(colour="red",breaks=L95)

update: with a recent version of ggplot2 (2.1.0) it doesn't seem possible to pass breaks to stat_density2d (or at least I don't know how), but the method below with geom_contour still seems to work ...

You can make things a little more efficient by computing the kernel density estimate once and plotting the tiles and contours from the same grid:

kk <- with(dd,MASS::kde2d(x,y))
library(reshape2)
dimnames(kk\$z) <- list(kk\$x,kk\$y)
dc <- melt(kk\$z)
ggplot(dc,aes(x=Var1,y=Var2))+
geom_tile(aes(fill=value))+
geom_contour(aes(z=value),breaks=L95,colour="red")
• doing the 95% level computation from the kk grid (to reduce the number of kernel computations to 1) is left as an exercise
• I'm not sure why stat_density2d(geom="tile") and geom_tile give slightly different results (the former is smoothed)
• I haven't added the bivariate mean, but something like annotate("point",x=mean(d\$x),y=mean(d\$y),colour="red") should work.