Kasia Kulma Kasia Kulma - 2 months ago 6
R Question

R: passing a value over multiple rows, until a value in a different column changes

I've got the customer activity data in the following format:

id = rep(1:2, each = 7)
week = rep(22:28, 2)
orders_num = c(NA, 1,0,0,0,3,0,1,4,0,0,1,0,0)
orders_char = c("none", "1", "none", "none", "none", "2+", "none", "1", "2+", "none", "none","1", "none", "none")
activity =c(NA, "active", "lapsed1", "lapsed2", "lapsed3", "active", "lapsed1", "active", "active", "lapsed1", "lapsed2" , "active", "lapsed1", "lapsed2")

df = data.frame(cbind(id, week, orders_num, orders_char, activity))
df
id week orders_num orders_char activity
1 22 <NA> none <NA>
1 23 1 1 active
1 24 0 none lapsed1
1 25 0 none lapsed2
1 26 0 none lapsed3
1 27 3 2+ active
1 28 0 none lapsed1
2 22 1 1 active
2 23 4 2+ active
2 24 0 none lapsed1
2 25 0 none lapsed2
2 26 1 1 active
2 27 0 none lapsed1
2 28 0 none lapsed2


I'm trying to create a column that would return a combination of
activity
and
orders_char
if
activity == "active"
and a respective
'lapsed'
category with
orders_char
corresponding to the most recent
'active'
status:

df_solution
id week orders_num orders_char activity final
1 22 <NA> none <NA> <NA>
1 23 1 1 active active_1
1 24 0 none lapsed1 lapsed1_1
1 25 0 none lapsed2 lapsed2_1
1 26 0 none lapsed3 lapsed3_1
1 27 3 2+ active active_2+
1 28 0 none lapsed1 lapsed1_2+
2 22 1 1 active active_1
2 23 4 2+ active active_2+
2 24 0 none lapsed1 lapsed1_2+
2 25 0 none lapsed2 lapsed2_2+
2 26 1 1 active active_1
2 27 0 none lapsed1 lapsed1_1
2 28 0 none lapsed2 lapsed2_1


Obviously, simply referring to the previous row is not going to work in cases where
lapsed
periods last longer:

df %>% group_by(id) %>% arrange(id, week) %>%
mutate(final = ifelse(activity == "active", paste(activity, orders_char, sep="_"),
ifelse(grepl("lapsed", activity), paste(activity, lag(orders_char), sep="_"), NA)))

id week orders_num orders_char activity final
1 22 NA none NA <NA>
1 23 1 1 active active_1
1 24 0 none lapsed1 lapsed1_1
1 25 0 none lapsed2 lapsed2_none
1 26 0 none lapsed3 lapsed3_none
1 27 3 2+ active active_2+
1 28 0 none lapsed1 lapsed1_2+
2 22 1 1 active active_1
2 23 4 2+ active active_2+
2 24 0 none lapsed1 lapsed1_2+
2 25 0 none lapsed2 lapsed2_none
2 26 1 1 active active_1
2 27 0 none lapsed1 lapsed1_1
2 28 0 none lapsed2 lapsed2_none


Is there any way I could "fix"
orders_char
value from one active period to the other?

Thanks for your suggestions!

Answer

After grouping by 'id', and arrangeing by 'id' and 'week', we get the cumulative sum of logical vector (orders_char!= 'none'), replace that with the elements of 'orders_char' without the 'none' (orders_char[orders_char != "none"]) and paste with the 'activity' column.

df %>% 
   group_by(id) %>%
   arrange(id, week)  %>%
   mutate(ind = cumsum(orders_char!="none"), 
          final = ifelse(ind==0, NA, 
                     paste(activity, orders_char[orders_char!= "none"][ind], sep="_")) ) %>% 
   select(-ind)
#      id   week orders_num orders_char activity      final
#   <fctr> <fctr>     <fctr>      <fctr>   <fctr>      <chr>
#1       1     22         NA        none       NA       <NA>
#2       1     23          1           1   active   active_1
#3       1     24          0        none  lapsed1  lapsed1_1
#4       1     25          0        none  lapsed2  lapsed2_1
#5       1     26          0        none  lapsed3 lapsed3_2+
#6       1     27          3          2+   active  active_2+
#7       1     28          0        none  lapsed1  lapsed1_1
#8       2     22          1           1   active   active_1
#9       2     23          4          2+   active  active_2+
#10      2     24          0        none  lapsed1 lapsed1_2+
#11      2     25          0        none  lapsed2 lapsed2_2+
#12      2     26          1           1   active   active_1
#13      2     27          0        none  lapsed1  lapsed1_1
#14      2     28          0        none  lapsed2  lapsed2_1