Mehdi Farhangian Mehdi Farhangian - 4 months ago 10
R Question

count variables with a certain condition

I am going to count if an event occurred and if it is occurred it had any consequence or not. Let's assume this is my data

#mydata

a b c d consequence
0 0 1 1 0
1 0 1 1 1
1 1 1 0 0
0 0 0 1 0


So, for each variable I calculate how many times a variable occurred and how many times this variable caused a consequence:an example for "a"

numberofa=length (subset(mydata, mydata$a==1))
numberofaeffective= Length (subset(mydata, mydata$a==1 $ mydata$consequence=1))


How can I write a program to calculate these two metrics for each variable?

#expected output

variable count count-with-effect
a 2 1
b 1 0
c 3 1
d 3 1

Answer

We can do this with sum of logical vector

sum(dts$a==1)
#[1] 2

and

with(dts, sum(a==1 & consequence == 1))
#[1] 1

If we need it for each of the variables (i.e. 'a' to 'd')

colSums(dts[1:4] == 1)
# a b c d 
# 2 1 3 3 

and for the second with 'consequence'

colSums(dts[1:4] == 1 & (dts[5] == 1)[row(dts[1:4])])
#a b c d 
#1 0 1 1 

If we need it in a specific format, we can gather the dataset into 'long' format, then do the group by operation and summarise by summing the 'value' column

library(dplyr)
library(tidyr)
gather(dts, variable, value, -consequence) %>% 
             group_by(variable) %>% 
             summarise(count = sum(value), count_with_effect = sum(value & consequence))
#  variable count count_with_effect
#     <chr> <int>             <int>
#1        a     2                 1
#2        b     1                 0
#3        c     3                 1
#4        d     3                 1