A.Chandy - 6 months ago 67

R Question

If I have a density function and I plot it with a particular bandwidth, I visually determine that there are 7 local maximums. I would just like to know how to plot separate distributions of the particular maximums on the same plot.

Also, if is possible to know exactly where the maximums occur by running some code? I can make ball-park estimates using the plot but is there an R function that I can use to get the exact points? I would like to know the mean and variance of the 7 densities that I have identified.

Specifically, I have the following:

`plot(density(stamp, bw=0.0013,kernel = "gaussian"))`

Answer

Determining which modes are real in a kernel density estimate is a matter of which bandwidth you chose to use. This is a complicated thing, and I don't advise choosing but a single bandwidth, as even different optimal rules of thumb can give you different answers. In general, the number of modes of a kde is less than the number of the underlying density in the oversmooothed case and more so in the undersmoothed case. There are many papers that cover this topic and give you many options to help determine the veracity of a mode. e.g., check out Silverman's mode test for Gaussian kernels, Friedman and Fisher's prim algorithm, Marron's siZer, and Minnotte and Scott's mode tree are good places to start.

A naive thing you can do, given a single KDE choice of bandwidth is check the run lengths.

In fact, with the bandwidth you have chosen, I find 9 modes. Just calculate the sign change of the difference in the series, and calculate the cumulative length of the runs in order to find the points. Every other point will be a mode or an antimode, depending on which came first. (You can check the sign to determine this)

```
library(BSDA)
dstamp <- density(Stamp$thickness, bw=0.0013, kernel = "gaussian")
chng <- cumsum(rle(sign(diff(dstamp$y)))$lengths)
plot(dstamp)
abline(v = dstamp$x[chng[seq(1,length(chng),2)]])
```