JonMinton - 6 months ago 27

R Question

Say we have:

`x <- rnorm(1000)`

y <- rnorm(1000)

How do I use ggplot2 to produce a plot containing the two following geoms:

- The bivariate expectation of the two series of values
- A contour line showing where 95% of the estimates fall within?

I know how to do the first part:

`df <- data.frame(x=x, y=y)`

p <- ggplot(df, aes(x=x, y=y))

p <- p + xlim(-10, 10) + ylim(-10, 10) # say

p <- p + geom_point(x=mean(x), y=mean(y))

And I also know about the stat_contour() and stat_density2d() functions within ggplot2.

And I also know that there are 'bins' options within stat_contour.

However, I guess what I need is something like the probs argument within quantile, but over two dimensions rather than one.

I have also seen a solution within the graphics package. However, I would like to do this within ggplot.

Help much appreciated,

Jon

Answer

This works, but is quite inefficient because you actually have to compute the kernel density estimate three times.

```
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
kk <- MASS::kde2d(x,y)
dx <- diff(kk$x[1:2])
dy <- diff(kk$y[1:2])
sz <- sort(kk$z)
c1 <- cumsum(sz) * dx * dy
approx(c1, sz, xout = 1 - prob)$y
}
L95 <- getLevel(d$x,d$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
stat_density2d(geom="tile", aes(fill = ..density..),
contour = FALSE)+
stat_density2d(colour="red",breaks=L95)
```

(with help from http://comments.gmane.org/gmane.comp.lang.r.ggplot2/303)

**update**: with a recent version of `ggplot2`

(2.1.0) it doesn't seem possible to pass `breaks`

to `stat_density2d`

(or at least I don't know how), but the method below with `geom_contour`

still seems to work ...

You can make things a little more efficient by computing the kernel density estimate once and plotting the tiles and contours from the same grid:

```
kk <- with(dd,MASS::kde2d(x,y))
library(reshape2)
dimnames(kk$z) <- list(kk$x,kk$y)
dc <- melt(kk$z)
ggplot(dc,aes(x=Var1,y=Var2))+
geom_tile(aes(fill=value))+
geom_contour(aes(z=value),breaks=L95,colour="red")
```

- doing the 95% level computation from the
`kk`

grid (to reduce the number of kernel computations to 1) is left as an exercise - I'm not sure why
`stat_density2d(geom="tile")`

and`geom_tile`

give slightly different results (the former is smoothed) - I haven't added the bivariate mean, but something like
`annotate("point",x=mean(d$x),y=mean(d$y),colour="red")`

should work.