MFR MFR - 22 days ago 5
R Question

Total rows does not contain a factor and the value is not zero

I have the following data

path value
1 b,b,a,c 3
2 c,b 2
3 a 10
4 b,c,a,b 0
5 e,f 0
6 a,f 1


df



df <- data.frame (path= c("b,b,a,c", "c,b", "a", "b,c,a,b" ,"e,f" ,"a,f"), value = c(3,2,10,0,0,1))


I wish to compute the total number that I do not have a factor and the the value is not zero. So my desired output will be:

#desiored output
path value
1: b 2
2: a 1
3: c 2
4: e 4
5: f 3


For instance, for
a
it shows the total number that we do not have
a
and the value is not zero is equal to 1. Only one time in row 2 we do not have
a
and the value is not zero. (hope it is clear, please let me know if more example is required)

I tried the following code but the out put for
b
is wrong. Does anyone know why?

total <- sum(df$value != 0)

library (splitstackshape)

#total number of total minus total number that a value is not zero

output <-cSplit(df, "path", ",", 'long')[, .(value=total - sum(value!=0)), .(path)]

output


This code results in the following output which is not correct for
b


path value
1: b 1
2: a 1
3: c 2
4: e 4
5: f 3

Answer

Read the factors into facs and then use grep them out and count:

facs <- unique(scan(textConnection(as.character(df$path)), what = "", sep = ","))
data.frame(path = facs, 
           value = colSums( !sapply(facs, grepl, as.character(df$path)) & df$value != 0 ))

giving:

  path value
b    b     2
a    a     1
c    c     2
e    e     4
f    f     3
Comments