HNSKD HNSKD - 1 month ago 3x
R Question

How do you find a maximum character of a vector based on user-defined hierarchy in R?

How do you find a maximum character of a vector based on user-defined hierarchy in R?

I have a variable, say

which I would like to impose a hierarchy on so that MSP<1A<1B<1C<2A<2B<2C<...<7C. I want to apply a maximum function on a vector containing these elements. While there's no problem for those characters that are in the form [digit][letter], there is a problem for "MSP". In R, max("2A","MSP") gives "MSP", but I want it to be "2A" instead.

In other words, R will rank it this way 1A<...MSP. This is due to the fact that in R, the hierarchy is "integer < double < character".



# No Date Time Code
# 1 1 2013-09-05 14H 15M 0S MSP
# 2 1 2013-09-05 14H 15M 0S 3A
# 3 1 2013-05-20 9H 40M 0S 5B
# 4 1 2013-05-20 14H 30M 0S <NA>
# 5 2 2013-05-23 14H 30M 0S <NA>
# 6 2 2013-05-23 14H 30M 0S <NA>
# 7 2 2013-05-28 13H 10M 0S 7C
# 8 2 2013-05-28 13H 10M 0S 3B
# 9 3 2013-06-03 8H 45M 0S MSP
# 10 3 2013-06-03 8H 45M 0S MSP
# 11 3 2013-06-03 8H 45M 0S <NA>

My Code:
I would like to take the maximum on
(based on the user-defined rank) within each No and Date. So, the rows that share the same No and the same Date will have the same Code. Hence, I grouped it first.

I created a new variable called
using the

  1. The condition is such that if there is at least one "MSP" and not all elements are "MSP"

  2. Yes: Take max function by excluding "MSP"

  3. No: Take max function the usual way

    dfnew<-df %>%
    group_by(No,Date) %>%
    mutate(IndicatorMSP=(Code=="MSP" & ! %>%
    mutate(CodeNo=sum(! %>%
    mutate(CodeAnother=ifelse(sum(IndicatorMSP)>=1 & sum(IndicatorMSP)<CodeNo,
    max(Code[!(Code=="MSP") & !]),

I would like to know if there is a nicer way of achieving this using nicer code.


factors are, for once, your friend:

An example where I reorder letters:

factoringVariable <- sample(letters)

> factoringVariable
[1] "z" "k" "p" "s" "f" "v" "j" "b" "o" "l" "u" "m" "w" "c" "n" "t" "r" "x" "a" "i" "y" "q" "h" "d" "e" "g"

> sort(factor(letters,levels = factoringVariable))
[1] z k p s f v j b o l u m w c n t r x a i y q h d e g

so in your case:

factoringVariable <- c('MSP', sort(unlist(outer(1:7,LETTERS[1:3],paste0))))

> factoringVariable
 [1] "MSP" "1A"  "1B"  "1C"  "2A"  "2B"  "2C"  "3A"  "3B"  "3C"  "4A"  "4B"  "4C"  "5A"  "5B"  "5C"  "6A"  "6B"  "6C"  "7A" 
[21] "7B"  "7C" 

Now that I've set my order:

df$Code <- factor(df$Code, levels = factoringVariable)

and then you can just use top_n function in dplyr (with -1 to get the bottom 1)

dfnew<-df %>%
  group_by(No,Date) %>%


> dfnew
Source: local data frame [5 x 4]
Groups: No, Date [4]

     No       Date     Time   Code
  <dbl>      <chr>    <chr> <fctr>
1     1 05/09/2013 14:15:00    MSP
2     1 20/05/2013 09:40:00     5B
3     2 28/05/2013 13:10:00     3B
4     3 03/06/2013 08:45:00    MSP
5     3 03/06/2013 08:45:00    MSP

EDIT: I realize now you want to assign all the largest value, in which case, we can't use top_n


dfnew<-df %>%
  group_by(No,Date) %>%
  mutate(CodeAll = sort(Code, partial = 1)[1])

EDIT 2: you can actually speed it up more (if you need the speed) by using partial sorting, since you're only going to take the first one anyways