MFR - 1 year ago 52
R Question

The average of values when my factor is excluded

I wish to take average of the values when my factor is excluded. This is my data:

``````dff= data.frame(v1 =c("a,b,d","b,a,g", "f,d") , v2 = c(1,2,3) )
``````

for example for
`a`
I wish to take average of rows that does not have
`a`
which is row 3 and average is 3. for
`f`
I wish to take average of rows without
`f`
which are rrow 1 and 3 and the average of V2 is (1 + 3) / 2 =1.5 My ideal output will be

``````factor   avg

a       3
b       3
d       2
f       1.5
g       2
``````

I tried two approaches: First one:

``````dff2 <- cSplit(dff, "v1",   " , ", "long")

c <- setDT(dff2)[,  .(value = (sum(dff2\$v2) - sum(v2))/(nrow(dff2)-.N)) , by =  v1]
``````

and the second:

``````library(tidyverse)

dff %>% separate_rows(v1,  convert = TRUE) %>%
group_by(v1) %>%
summarise(avg = mean(.\$v2[!.\$v1 %in% v1]))
``````

none of them gave me the desirable outcome

Answer Source

First, we can get the collection of factors using:

``````fac <- unique(unlist(strsplit(as.character(dff\$v1),split=",")))
##[1] "a" "b" "d" "g" "f"
``````

then, we can compute what you want using:

``````avg <- sapply(fac, function(f) mean(ifelse(grepl(f,dff\$v1), NA, dff\$v2),na.rm=TRUE))
##  a   b   d   g   f
##3.0 3.0 2.0 2.0 1.5
``````

Here, we use `grepl` to determine if each `fac` is found in `dff\$v1`. If it is, then the `ifelse` returns `NA` for that row, and if it is not, returns the value of `dff\$v2` for that row. We can then compute the `mean`, ignoring the `NA`s for those rows that `fac` is found in `dff\$v1`, as you wished.

Or, if you prefer, we can put the results into a `data.frame`:

``````data.frame(factor=fac,avg=avg)
##  factor avg
##a      a 3.0
##b      b 3.0
##d      d 2.0
##g      g 2.0
##f      f 1.5
``````

In order to have exact match using `%in%`, consider the data:

``````## Note use of stringsAsFactors=FALSE to avoid having to coerce to character
## for strsplit
dff <- data.frame(v1 =c("a,b,d","b,a,g", "a.b,f,d"), v2 = c(1,2,3), stringsAsFactors=FALSE)

fac <- unique(unlist(strsplit(dff\$v1,split=",")))
##[1] "a"   "b"   "d"   "g"   "a.b" "f"
avg <- sapply(fac, function(f) mean(ifelse(sapply(strsplit(dff\$v1,split=","), function(d) f %in% d), NA, dff\$v2),na.rm=TRUE))
##  a   b   d   g a.b   f
##3.0 3.0 2.0 2.0 1.5 1.5
``````