Steen Harsted Steen Harsted - 3 years ago 158
R Question

Group calculations using group_by and subset commands

I am a rookie STATA user trying to make the jump to R. I am working through various exercises, but keep getting something wrong with the group_by and subset command.

I have a simple dataset that I wish to make groupbased calculations on. I am trying to use the groups_by command from the dplyr package to do this.

My dataset is called itchy and consists of 4 variabels:

treat- levels A and B (type of treatment)

type- levels Dark and Fair (skin-colour)

y - levels 0 and 1 (failure or succes of treatment)

freq - numerical variable indicating how many are in this particular group

Using this code you can recreate it:

type <- c(2,2,2,2,1,1,1,1)
treat <-c(1,1,2,2,1,1,2,2)
y <- c(1,0,1,0,1,0,1,0)
freq <- c(9,17,5,20,10,15,3,20)
itchy <- cbind.data.frame(type,treat,y,freq)
itchy$type <- as.factor(type)
itchy$type <- factor(itchy$type,levels = c(1,2), labels = c("Dark", "Fair"))
itchy$treat <- as.factor(treat)
itchy$treat <- factor(itchy$treat,levels = c(1,2), labels = c("A", "B"))
itchy$y <- as.factor(y)
itchy$y <- factor(itchy$y,levels = c(0,1), labels = c("failure", "succes"))


Now I would like to calculate the ods for a success for treatment A and B when applied to skintype Dark or Fair. (ods = nr of successful events/nr of failures)

I have two questions:

1) Can you help me do the ods calculations by groups?

2) I have tried with various combinations of group_by and subset, without any luck. The below code shows some of my unsuccessful attempts. Can you then tell I have a basic misunderstanding of how the group_by and subset commands work

itchy %>% group_by(treat, type) %>% summarize(ods = (subset(freq, y==1)/subset(freq, y==0)))

itchy %>% group_by(treat, type) %>% ods <- c((subset(freq, y==1)/subset(freq, y==0)))

itchy %>% group_by(treat, type) %>% itchy$ods <- (subset(freq, y==1)/subset(freq, y==0))

Answer Source

If I understand you correctly, I think the following will work. I made use of the the spread function from the tidyr package, which like dplyr is part of the tidyverse


library(tidyr)
itchy %>% 
  spread(y, freq) %>% 
  mutate(odds = succes / failure)

  type treat failure succes      odds
1 Dark     A      15     10 0.6666667
2 Dark     B      20      3 0.1500000
3 Fair     A      17      9 0.5294118
4 Fair     B      20      5 0.2500000
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download