gamemastersr - 6 months ago 24

R Question

I have data of this format and want to make a contour plot. When I try to use the

`density(z)`

`"x must be numeric"`

`z <- c(`

c(8.83,8.89),

c(8.89,8.94),

c(8.84,8.9),

c(8.79,8.852),

c(8.79,8.88),

c(8.8,8.82),

c(8.75,8.78),

c(8.8,8.8),

c(8.74,8.81),

c(8.89,8.99),

c(8.97,8.97),

c(9.04,9.08),

c(9,9.01),

c(8.99,8.99),

c(8.93,8.97)

)

z <- matrix(z, ncol = 2, byrow = TRUE)

Answer

`density()`

is used for **univariate** density estimation. Since you have two independent variables: `long`

and `lat`

, you should use `kde2d()`

from R's default package `MASS`

.

```
library(MASS)
fit <- kde2d(z[,1], z[,2])
contour(fit$x, fit$y, fit$z)
## show original data locations
points(z, pch = 19, col = 4)
```

**Follow-up**

If you look at `?kde2d`

:

```
Usage:
kde2d(x, y, h, n = 25, lims = c(range(x), range(y)))
```

The default number of cells along each of `x`

and `y`

are `n = 25`

, which gives you a 25 * 25 grid. Density estimation is done on this grids. Perhaps you are wondering why estimation is done on a regular grid. Because such grid is like pixels of a digital photo. Grid/raster like object is convenient for visualization. Actually, if you want computer to proceed 3D graph, you have to give it a raster like object.

In practice, you should choose `n`

according to how many data you have. Note that a 25 * 25 grid has 625 cells, this is quite fair when you have 1000 data points. You can also try `n = 50`

. Setting `n`

is very similar to setting number of bins when you produce a histogram. As `n`

increases, your resulting estimation is more jagged. Consider the histogram example if you are unclear:

```
x <- rnorm(200)
hist(x, breaks = 10)
hist(x, breaks = 20)
```

Precisely, density estimation is different to histogram; the former is a kernel smoother, while the latter is a primitive bin smoother. But the choice of `n`

(refinement) does has equal effect.