JonMinton - 11 months ago 41

R Question

Say we have:

`x <- rnorm(1000)`

y <- rnorm(1000)

How do I use ggplot2 to produce a plot containing the two following geoms:

- The bivariate expectation of the two series of values
- A contour line showing where 95% of the estimates fall within?

I know how to do the first part:

`df <- data.frame(x=x, y=y)`

p <- ggplot(df, aes(x=x, y=y))

p <- p + xlim(-10, 10) + ylim(-10, 10) # say

p <- p + geom_point(x=mean(x), y=mean(y))

And I also know about the stat_contour() and stat_density2d() functions within ggplot2.

And I also know that there are 'bins' options within stat_contour.

However, I guess what I need is something like the probs argument within quantile, but over two dimensions rather than one.

I have also seen a solution within the graphics package. However, I would like to do this within ggplot.

Help much appreciated,

Jon

Answer Source

This works, but is quite inefficient because you actually have to compute the kernel density estimate three times.

```
set.seed(1001)
d <- data.frame(x=rnorm(1000),y=rnorm(1000))
getLevel <- function(x,y,prob=0.95) {
kk <- MASS::kde2d(x,y)
dx <- diff(kk$x[1:2])
dy <- diff(kk$y[1:2])
sz <- sort(kk$z)
c1 <- cumsum(sz) * dx * dy
approx(c1, sz, xout = 1 - prob)$y
}
L95 <- getLevel(d$x,d$y)
library(ggplot2); theme_set(theme_bw())
ggplot(d,aes(x,y)) +
stat_density2d(geom="tile", aes(fill = ..density..),
contour = FALSE)+
stat_density2d(colour="red",breaks=L95)
```

(with help from http://comments.gmane.org/gmane.comp.lang.r.ggplot2/303)

**update**: with a recent version of `ggplot2`

(2.1.0) it doesn't seem possible to pass `breaks`

to `stat_density2d`

(or at least I don't know how), but the method below with `geom_contour`

still seems to work ...

You can make things a little more efficient by computing the kernel density estimate once and plotting the tiles and contours from the same grid:

```
kk <- with(dd,MASS::kde2d(x,y))
library(reshape2)
dimnames(kk$z) <- list(kk$x,kk$y)
dc <- melt(kk$z)
ggplot(dc,aes(x=Var1,y=Var2))+
geom_tile(aes(fill=value))+
geom_contour(aes(z=value),breaks=L95,colour="red")
```

- doing the 95% level computation from the
`kk`

grid (to reduce the number of kernel computations to 1) is left as an exercise - I'm not sure why
`stat_density2d(geom="tile")`

and`geom_tile`

give slightly different results (the former is smoothed) - I haven't added the bivariate mean, but something like
`annotate("point",x=mean(d$x),y=mean(d$y),colour="red")`

should work.