MFR MFR - 23 days ago 13
R Question

Average on concatenated string with some conditions

I wish to average on my concatenated string with some conditions. this is my data:

id path events

1 a, b 2, 3
2 c, a 3, 4
3 b 5


I'd like to take average of rows that do not have a particular path, for instance the average of rows who does not have c (i.e row 1 and 3) is (2 + 3 + 5) /3 = 3.33

and similar for the others so my desired output will be

path avg
a 5
b 3.5
c 3.33


before that I tried for not concatenated data and it worked

output <- sapply(as.character(unique(df$path)),
function(x) mean(subset(df, !path %in% x)$events))


But could not come up with an idea for this situation

This is my data

mydata <- data.frame(id =c(1,2,3),
path= c("a,b", "c,a", "b"),
events =c (("2,3"), ("3,4"), ("5")))

Answer

Here's a tidyverse approach:

library(tidyverse)

mydata %>% separate_rows(path, events, convert = TRUE) %>%    # unnest rows
    group_by(path) %>%    # set grouping
    summarise(avg = mean(.$events[!.$id %in% id]))    # summarize groups

## # A tibble: 3 × 2
##    path      avg
##   <chr>    <dbl>
## 1     a 5.000000
## 2     b 3.500000
## 3     c 3.333333

Note the summarization uses .$[column name] to refer to the entire column, and just [column name] to refer to the values for the group.

Comments