HNSKD - 1 year ago 60
R Question

# How do you find a maximum character of a vector based on user-defined hierarchy in R?

How do you find a maximum character of a vector based on user-defined hierarchy in R?

I have a variable, say

`Code`
which I would like to impose a hierarchy on so that MSP<1A<1B<1C<2A<2B<2C<...<7C. I want to apply a maximum function on a vector containing these elements. While there's no problem for those characters that are in the form [digit][letter], there is a problem for "MSP". In R, max("2A","MSP") gives "MSP", but I want it to be "2A" instead.

In other words, R will rank it this way 1A<...MSP. This is due to the fact that in R, the hierarchy is "integer < double < character".

Dataframe:

``````No<-rep(c(1,2,3),c(4,4,3))
Date<-rep(c("05/09/2013","20/05/2013","23/05/2013","28/05/2013","03/06/2013"),c(2,2,2,2,3))
Time<-rep(c("14:15:00","09:40:00","14:30:00","13:10:00","08:45:00"),c(2,1,3,2,3))
Code<-c("MSP","3A","5B",NA,NA,NA,"7C","3B","MSP","MSP",NA)
df<-data.frame(No,Date,Time,Code,stringsAsFactors=FALSE)
df\$Date<-dmy(df\$Date)
df\$Time<-hms(df\$Time)
df

#     No       Date       Time Code
# 1   1 2013-09-05 14H 15M 0S  MSP
# 2   1 2013-09-05 14H 15M 0S   3A
# 3   1 2013-05-20  9H 40M 0S   5B
# 4   1 2013-05-20 14H 30M 0S <NA>
# 5   2 2013-05-23 14H 30M 0S <NA>
# 6   2 2013-05-23 14H 30M 0S <NA>
# 7   2 2013-05-28 13H 10M 0S   7C
# 8   2 2013-05-28 13H 10M 0S   3B
# 9   3 2013-06-03  8H 45M 0S  MSP
# 10  3 2013-06-03  8H 45M 0S  MSP
# 11  3 2013-06-03  8H 45M 0S <NA>
``````

My Code:
I would like to take the maximum on
`Code`
(based on the user-defined rank) within each No and Date. So, the rows that share the same No and the same Date will have the same Code. Hence, I grouped it first.

I created a new variable called
`CodeAnother`
using the
`ifelse`
function.

1. The condition is such that if there is at least one "MSP" and not all elements are "MSP"

2. Yes: Take max function by excluding "MSP"

3. No: Take max function the usual way

``````library(dplyr)
dfnew<-df %>%
group_by(No,Date) %>%
mutate(IndicatorMSP=(Code=="MSP" & !is.na(Code))) %>%
mutate(CodeNo=sum(!is.na(Code))) %>%
mutate(CodeAnother=ifelse(sum(IndicatorMSP)>=1 & sum(IndicatorMSP)<CodeNo,
max(Code[!(Code=="MSP") & !is.na(Code)]),
max(Code[!is.na(Code)])))
``````

I would like to know if there is a nicer way of achieving this using nicer code.

factors are, for once, your friend:

An example where I reorder letters:

``````factoringVariable <- sample(letters)

> factoringVariable
[1] "z" "k" "p" "s" "f" "v" "j" "b" "o" "l" "u" "m" "w" "c" "n" "t" "r" "x" "a" "i" "y" "q" "h" "d" "e" "g"

> sort(factor(letters,levels = factoringVariable))
[1] z k p s f v j b o l u m w c n t r x a i y q h d e g
``````

``````factoringVariable <- c('MSP', sort(unlist(outer(1:7,LETTERS[1:3],paste0))))

> factoringVariable
[1] "MSP" "1A"  "1B"  "1C"  "2A"  "2B"  "2C"  "3A"  "3B"  "3C"  "4A"  "4B"  "4C"  "5A"  "5B"  "5C"  "6A"  "6B"  "6C"  "7A"
[21] "7B"  "7C"
``````

Now that I've set my order:

``````df\$Code <- factor(df\$Code, levels = factoringVariable)
``````

and then you can just use `top_n` function in dplyr (with -1 to get the bottom 1)

``````library(dplyr)
dfnew<-df %>%
group_by(No,Date) %>%
top_n(-1,Code)
``````

output:

``````> dfnew
Source: local data frame [5 x 4]
Groups: No, Date [4]

No       Date     Time   Code
<dbl>      <chr>    <chr> <fctr>
1     1 05/09/2013 14:15:00    MSP
2     1 20/05/2013 09:40:00     5B
3     2 28/05/2013 13:10:00     3B
4     3 03/06/2013 08:45:00    MSP
5     3 03/06/2013 08:45:00    MSP
``````

EDIT: I realize now you want to assign all the largest value, in which case, we can't use top_n

``````library(dplyr)