ROY ROY - 2 months ago 21
R Question

Ranking based on multiple conditions

I have a dataset (nm) as shown below, where first three columns shows user is using that product or not i.e. 1(yes) or 0(not). The next three columns shows what is the percentage of liking that products where A_TT corresponds to A_TT_1.1 and likewise:

nm

A_TT B_TT C_TT A_TT_1.1 B_TT_2.1 C_TT_3.1
0 0 1 0.06 0.84 0.89
1 1 0 0.92 0.83 0.94
0 1 0 0.09 0.9 0.13
1 0 1 0.87 0.05 0.9
0 0 0 0.13 0.11 0.14


I want to add two columns in dataset(nm) i.e. Rank1 & Rank2, which ranks top two percentages having following two conditions:

a) Select and rank only where value is "zero" in first three columns

b) In rank column it should return its column name.

The final output should be:

nm

A_TT B_TT C_TT A_TT_1.1 B_TT_2.1 C_TT_3.1 Rank 1 Rank 2
0 0 1 0.06 0.84 0.89 B_TT_2.1 A_TT_1.1
1 1 0 0.92 0.83 0.94 C_TT_3.1 NONE
0 1 0 0.09 0.9 0.13 C_TT_3.1 A_TT_1.1
1 0 1 0.87 0.05 0.9 B_TT_2.1 NONE
0 0 0 0.13 0.11 0.14 C_TT_3.1 A_TT_1.1


I have tried many options but nothing works. Thanks in advance for providing solutions.

Answer

We can do this with apply specifying MARGIN = 1 to loop over the rows

nm[paste0("Rank", 1:2)] <-  t(apply(nm, 1, FUN = function(x)  {
                         i1 <- !x[1:3] #logical index giving TRUE for 0
                         x1 <- x[4:6][i1] #subset elements 4:6 based on i1
                         names(x1[order(-x1)])[1:2] #order the 'x1', extract names
                          }))
nm
#  A_TT B_TT C_TT A_TT_1.1 B_TT_2.1 C_TT_3.1    Rank1    Rank2
#1    0    0    1     0.06     0.84     0.89 B_TT_2.1 A_TT_1.1
#2    1    1    0     0.92     0.83     0.94 C_TT_3.1     <NA>
#3    0    1    0     0.09     0.90     0.13 C_TT_3.1 A_TT_1.1
#4    1    0    1     0.87     0.05     0.90 B_TT_2.1     <NA>
#5    0    0    0     0.13     0.11     0.14 C_TT_3.1 A_TT_1.1

NOTE: Instead of "NONE", it is better to have NA as missing value as it is easier to remove those values with functions is.na/complete.cases/na.rm/na.omit etc.