Kate2808 Kate2808 - 8 days ago 5
R Question

Discrepancies in the density() kernel estimator compared to calculations by scratch

I am trying to calculate the Gaussian kernel density, and to test of my knowledge of the

density()
function, I decided to calculate it from scratch and compare the two results.

However, they do not provide the same answer.

I start with an existing dataset

xi <- mtcars$mpg


and can plot the kernel density of this data, as follows

plot(density(xi, kernel = "gaussian"))


which provides this...

Automated gaussian kernel density

I then grab some of the details from this calculation, so that my calculation is consistent.

auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction


I then calculate the gaussian kernel density myself, and I have
done this in a loop, just so it is clearer to read.

fx0 <- NULL

for (j in 1:length(x0)){

t <- abs(x0[j]-xi)/h

K <- (1/sqrt(2*pi))*exp(-(t^2)/2)

fx0 <- c(fx0,sum(K*t)/(length(t)*h))
}


The basic calculation has been constructed following the details in section 3.3.6 in Statistical Methods in the Atmospheric Sciences, 3rd Edition, by Daniel Wilks. Equation 3.13 from Wilks textbook
with the Gaussian kernel set as enter image description here and t being enter image description here

However, and here is my problem.

I then plot the two together...

plot(y=fx0,x=x0, type="l", ylim=c(0,0.07))
lines(x=auto.dens$x, y=auto.dens$y, col="red")


The output from the density function (red), and my calculations (black), I get
enter image description here

!These two calculations are clearly different!

Have I miss understood how the density function works? Why can't I manage to calcualte the same results from scratch? Why is my kernel estimator providing different results? Why are my results less smooth?

I need to construct and apply a kernel smoother (not just of density) to a much more complicated dataset, and only did this little example to make sure I was doing the same as the the automated functions, and really wasn't expecting the have this problem. I've tried all kinds of things, and just cannot see why I get a different result.

Thank you all in advance, for reading and any comments, little or big.

Edit: 13:40 29/11/2016
Solution as detailed in answer below
enter image description here

Answer

You don't need to sum(K*t), just sum(K).

xi <- mtcars$mpg
plot(density(xi, kernel = "gaussian"), lwd = 2)

auto.dens <- density(xi, kernel = "gaussian")
h <- auto.dens$bw # bandwidth for kernel
x0 <- auto.dens$x # points for prediction

fx0 <- NULL
for (j in 1:length(x0)) {
  t <- abs(x0[j]-xi)/h
  K <- (1/sqrt(2*pi))*exp(-(t^2)/2)
  fx0 <- c(fx0, sum(K)/(length(t)*h))
}

lines(x0, fx0, col = "red", lty = "dotted")