MBB - 6 months ago 24
R Question

# Select values row-wise based on rank among dates

Let's say I have a data frame with several rows like the following:

``````df <- data.frame(a = c(NA,20,NA),
date1 = c("2016-03-01", "2016-02-01", "2016-02-01"),
b = c(50,NA, NA),
date2 = c("2016-02-01", "2016-03-01", "2016-03-01"),
c = c(10,10, 10),
date3 = c("2016-01-01","2016-01-01", "2016-01-01"))
``````

For each row, I want to get the latest value which is not a
`NA`
between
`a`
,
`b`
, and
`c`
according to the
`dates`
(so I respectively look at
`date1`
,
`date2`
, or
`date3`
and pick the most recent one).

Basically,
`date1`
gives the date corresponding to the value
`a`
,
`date2`
gives the date corresponding to the value
`b`
,
`date3`
gives the date corresponding to the value
`c`
.

If
`date1 > date2`
&
`date1 > date3`
, I will want to take the value
`a`

However, if the value
`a`
is
`NA`
(which is the case in my example), I will compare
`date2`
and
`date3`
. In my example,
`date2 > date3`
, and since value
`b`
is not
`NA`
but
`50`
, I will take
`50`
as my final result.

Now I want to do this for all the rows in my dataframe

Since I am using
`dplyr`
, I tried to use the
`case_when`
function by using the rank function (in my example, I look a the first ranked date, and then look at the linked value. If it is a NA, I look at the 2nd best ranked, etc...)

However, I can't just put, as I'd like to do, :

``````df <- df %>%
mutate(result = case_when(is.na(a) & is.na(b) & is.na(c) ~ NA_integer_,
rev(rank(date1, date2, date3))[1] == 3 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 3 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 3 & !is.na(a) ~ c,
rev(rank(date1, date2, date3))[1] == 2 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 2 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 2 & !is.na(a) ~ c,
rev(rank(date1, date2, date3))[1] == 1 & !is.na(a) ~ a,
rev(rank(date1, date2, date3))[2] == 1 & !is.na(b) ~ b,
rev(rank(date1, date2, date3))[3] == 1 & !is.na(a) ~ c))
``````

Because the
`rank`
function needs a unique vector as argument (but I can't put
`c(date1, date2, date3)`
neither because it would give me the whole order of this vector and not the rank for each row)

In my example the result I would like to have would be

``````res

a    date1         b      date2       c    date3       result
NA   2016-03-01    50     2016-02-01  10   2016-01-01  50
20   2016-02-01    NA     2016-03-01  10   2016-01-01  20
NA   2016-02-01    NA     2016-03-01  10   2016-01-01  10
``````

Does anyone have an idea or even an entirely different approach to this problem ?

I suggest converting to long-format and computing the relevant values. If you want, you can then add the results to your original data.frame. Here's how you could do that using data.table:

``````library(data.table)
setDT(df)                     # convert to data.table object
df[, row := .I]               # add a row-id
dflong <- melt(df, id = "row", measure = patterns("^date", "^(a|b|c)"),
na.rm = TRUE) # convert to long format
setorder(dflong, value1)      # reorder by date value
dflong <- unique(dflong, by = "row", fromLast = TRUE) # get the latest dates
df[dflong, result := i.value2, on = "row"]  # add result to original data

df
#    a      date1  b      date2  c      date3 row result
#1: NA 2016-03-01 50 2016-02-01 10 2016-01-01   1     50
#2: 20 2016-02-01 NA 2016-03-01 10 2016-01-01   2     20
#3: NA 2016-02-01 NA 2016-03-01 10 2016-01-01   3     10
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download