runningbirds - 1 year ago 66
R Question

# Calculating the mode or 2nd/3rd/4th most common value

Surely there has to be a function out there in some package for this?

I've searched and I've found this function to calculate the mode:

``````Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
``````

But I'd like a function that lets me easily calculate the 2nd/3rd/4th/nth most common value in a column of data.

Ultimately I will apply this function to a large number of
`dplyr::group_by()`
s.

Maybe you could try

``````f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])
``````

This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.

Another method is to based on `table()`:

``````g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))
``````

But this is not recommended, as input vector `x` will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.

Example

``````set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16
``````

Let's compare with the contingency table from `table`:

``````tab <- sort(table(x), decreasing = TRUE)
# 11 12  7  9  8 13 10 14  5 15  6  2  3 16
# 14 14 11 11 10 10  9  7  5  4  2  1  1  1

as.numeric(names(tab))
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16
``````

So the results are the same.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download