runningbirds runningbirds - 3 months ago 8
R Question

Calculating the mode or 2nd/3rd/4th most common value

Surely there has to be a function out there in some package for this?

I've searched and I've found this function to calculate the mode:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}


But I'd like a function that lets me easily calculate the 2nd/3rd/4th/nth most common value in a column of data.

Ultimately I will apply this function to a large number of
dplyr::group_by()
s.

Thank you for your help!

Answer

Maybe you could try

f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])

This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.

Another method is to based on table():

g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))

But this is not recommended, as input vector x will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.


Example

set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16

Let's compare with the contingency table from table:

tab <- sort(table(x), decreasing = TRUE)
# 11 12  7  9  8 13 10 14  5 15  6  2  3 16 
# 14 14 11 11 10 10  9  7  5  4  2  1  1  1

as.numeric(names(tab))
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16

So the results are the same.