Kate2808 - 1 year ago 89
R Question

# Discrepancies in the density() kernel estimator compared to calculations by scratch

I am trying to calculate the Gaussian kernel density, and to test of my knowledge of the

`density()`
function, I decided to calculate it from scratch and compare the two results.

However, they do not provide the same answer.

``````xi <- mtcars\$mpg
``````

and can plot the kernel density of this data, as follows

``````plot(density(xi, kernel = "gaussian"))
``````

which provides this...

I then grab some of the details from this calculation, so that my calculation is consistent.

``````auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens\$bw # bandwidth for kernel
x0 <- auto.dens\$x # points for prediction
``````

I then calculate the gaussian kernel density myself, and I have
done this in a loop, just so it is clearer to read.

``````fx0 <- NULL

for (j in 1:length(x0)){

t <- abs(x0[j]-xi)/h

K <- (1/sqrt(2*pi))*exp(-(t^2)/2)

fx0 <- c(fx0,sum(K*t)/(length(t)*h))
}
``````

The basic calculation has been constructed following the details in section 3.3.6 in Statistical Methods in the Atmospheric Sciences, 3rd Edition, by Daniel Wilks.
with the Gaussian kernel set as and t being

However, and here is my problem.

I then plot the two together...

``````plot(y=fx0,x=x0, type="l", ylim=c(0,0.07))
lines(x=auto.dens\$x, y=auto.dens\$y, col="red")
``````

The output from the density function (red), and my calculations (black), I get

!These two calculations are clearly different!

Have I miss understood how the density function works? Why can't I manage to calcualte the same results from scratch? Why is my kernel estimator providing different results? Why are my results less smooth?

I need to construct and apply a kernel smoother (not just of density) to a much more complicated dataset, and only did this little example to make sure I was doing the same as the the automated functions, and really wasn't expecting the have this problem. I've tried all kinds of things, and just cannot see why I get a different result.

Edit: 13:40 29/11/2016
Solution as detailed in answer below

You don't need to `sum(K*t)`, just `sum(K)`.

``````xi <- mtcars\$mpg
plot(density(xi, kernel = "gaussian"), lwd = 2)

auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens\$bw # bandwidth for kernel
x0 <- auto.dens\$x # points for prediction

fx0 <- NULL
for (j in 1:length(x0)) {
t <- abs(x0[j]-xi)/h
K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
fx0 <- c(fx0, sum(K)/(length(t)*h))
}

lines(x0, fx0, col = "red", lty = "dotted")
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download