Kate2808 - 6 months ago 33

R Question

I am trying to calculate the Gaussian kernel density, and to test of my knowledge of the

`density()`

I start with an existing dataset

`xi <- mtcars$mpg`

and can plot the kernel density of this data, as follows

`plot(density(xi, kernel = "gaussian"))`

which provides this...

I then grab some of the details from this calculation, so that my calculation is consistent.

`auto.dens <- density(xi, kernel = "gaussian")`

h <- auto.dens$bw # bandwidth for kernel

x0 <- auto.dens$x # points for prediction

I then calculate the gaussian kernel density myself, and I have

done this in a loop, just so it is clearer to read.

`fx0 <- NULL`

for (j in 1:length(x0)){

t <- abs(x0[j]-xi)/h

K <- (1/sqrt(2*pi))*exp(-(t^2)/2)

fx0 <- c(fx0,sum(K*t)/(length(t)*h))

}

The basic calculation has been constructed following the details in section 3.3.6 in Statistical Methods in the Atmospheric Sciences, 3rd Edition, by Daniel Wilks.

with the Gaussian kernel set as and t being

However, and here is my problem.

I then plot the two together...

`plot(y=fx0,x=x0, type="l", ylim=c(0,0.07))`

lines(x=auto.dens$x, y=auto.dens$y, col="red")

The output from the density function (red), and my calculations (black), I get

Have I miss understood how the density function works? Why can't I manage to calcualte the same results from scratch? Why is my kernel estimator providing different results? Why are my results less smooth?

I need to construct and apply a kernel smoother (not just of density) to a much more complicated dataset, and only did this little example to make sure I was doing the same as the the automated functions, and really wasn't expecting the have this problem. I've tried all kinds of things, and just cannot see why I get a different result.

Thank you all in advance, for reading and any comments, little or big.

Answer

You don't need to `sum(K*t)`

, just `sum(K)`

.

```
xi <- mtcars$mpg
plot(density(xi, kernel = "gaussian"), lwd = 2)
auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction
fx0 <- NULL
for (j in 1:length(x0)) {
t <- abs(x0[j]-xi)/h
K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
fx0 <- c(fx0, sum(K)/(length(t)*h))
}
lines(x0, fx0, col = "red", lty = "dotted")
```

Source (Stackoverflow)