what what - 1 month ago 13
R Question

Calculate the modes in a multimodal distribution in R

I have measured the body heights of all my children. When I plot all heights along an axis of lengths, this is what the result looks like:

enter image description here

Every red (boys) or violet (girls) tick is one child. If two children have the same body height (in millimetres), the ticks get stacked. Currently there are seven children with the same body height. (The height and width of the ticks is meaningless. They have been scaled to be visible.)

As you can see, the different heights are not evenly distributed along the axis, but cluster around certain values.

A histrogram and density plot of the data looks like this (with the two density estimates plotted following this answer):

enter image description here

As you can see, this is a multimodal distribution.

How do I calculate the modes (in R)?




Here is the raw data for you to play with:

mm <- c(418, 527, 540, 553, 554, 558, 613, 630, 634, 636, 645, 648, 708, 714, 715, 725, 806, 807, 822, 823, 836, 837, 855, 903, 908, 910, 911, 913, 915, 923, 935, 945, 955, 957, 958, 1003, 1006, 1015, 1021, 1021, 1022, 1034, 1043, 1048, 1051, 1054, 1058, 1100, 1102, 1103, 1117, 1125, 1134, 1138, 1145, 1146, 1150, 1152, 1210, 1211, 1213, 1223, 1226, 1334)

Answer

I constructed something on my own using your mm data.

First let's plot the density of mm in order to visualise the modes:

plot(density(mm))

enter image description here

So, we can see there are 2 modes in this distribution. One around 600 and one around 1000. Let's see how to find them.

In order to find the mode indices I made this function:

find_modes<- function(x) {
  modes <- NULL
  for ( i in 2:(length(x)-1) ){
  if ( (x[i] > x[i-1]) & (x[i] > x[i+1]) ) {
    modes <- c(modes,i)
  }
  }
  if ( length(modes) == 0 ) {
    modes = 'This is a monotonic distribution'
  }
  return(modes)
}

Let's try it on our density:

mymodes_indices <- find_modes(density(mm)$y) #you need to try it on the y axis

Now mymodes_indices contains the indices of our modes i.e.:

> density(mm)$y[mymodes_indices]  #just to confirm that those are the correct
[1] 0.0008946929 0.0017766183

> density(mm)$x[mymodes_indices] #the actual modes
[1]  660.2941 1024.9067

Hope it helps!

Comments