MFR - 4 months ago 24

R Question

I wish to take average of the values when my factor is excluded. This is my data:

`dff= data.frame(v1 =c("a,b,d","b,a,g", "f,d") , v2 = c(1,2,3) )`

for example for

`a`

`a`

`f`

`f`

`factor avg`

a 3

b 3

d 2

f 1.5

g 2

I tried two approaches: First one:

`dff2 <- cSplit(dff, "v1", " , ", "long")`

c <- setDT(dff2)[, .(value = (sum(dff2$v2) - sum(v2))/(nrow(dff2)-.N)) , by = v1]

and the second:

`library(tidyverse)`

dff %>% separate_rows(v1, convert = TRUE) %>%

group_by(v1) %>%

summarise(avg = mean(.$v2[!.$v1 %in% v1]))

none of them gave me the desirable outcome

Answer

First, we can get the collection of factors using:

```
fac <- unique(unlist(strsplit(as.character(dff$v1),split=",")))
##[1] "a" "b" "d" "g" "f"
```

then, we can compute what you want using:

```
avg <- sapply(fac, function(f) mean(ifelse(grepl(f,dff$v1), NA, dff$v2),na.rm=TRUE))
## a b d g f
##3.0 3.0 2.0 2.0 1.5
```

Here, we use `grepl`

to determine if each `fac`

is found in `dff$v1`

. If it is, then the `ifelse`

returns `NA`

for that row, and if it is not, returns the value of `dff$v2`

for that row. We can then compute the `mean`

, ignoring the `NA`

s for those rows that `fac`

is found in `dff$v1`

, as you wished.

Or, if you prefer, we can put the results into a `data.frame`

:

```
data.frame(factor=fac,avg=avg)
## factor avg
##a a 3.0
##b b 3.0
##d d 2.0
##g g 2.0
##f f 1.5
```

In order to have exact match using `%in%`

, consider the data:

```
## Note use of stringsAsFactors=FALSE to avoid having to coerce to character
## for strsplit
dff <- data.frame(v1 =c("a,b,d","b,a,g", "a.b,f,d"), v2 = c(1,2,3), stringsAsFactors=FALSE)
fac <- unique(unlist(strsplit(dff$v1,split=",")))
##[1] "a" "b" "d" "g" "a.b" "f"
avg <- sapply(fac, function(f) mean(ifelse(sapply(strsplit(dff$v1,split=","), function(d) f %in% d), NA, dff$v2),na.rm=TRUE))
## a b d g a.b f
##3.0 3.0 2.0 2.0 1.5 1.5
```