 Steen Harsted - 3 years ago 183
R Question

# Group calculations using group_by and subset commands

I am a rookie STATA user trying to make the jump to R. I am working through various exercises, but keep getting something wrong with the group_by and subset command.

I have a simple dataset that I wish to make groupbased calculations on. I am trying to use the groups_by command from the dplyr package to do this.

My dataset is called itchy and consists of 4 variabels:

treat- levels A and B (type of treatment)

type- levels Dark and Fair (skin-colour)

y - levels 0 and 1 (failure or succes of treatment)

freq - numerical variable indicating how many are in this particular group

Using this code you can recreate it:

``````type <- c(2,2,2,2,1,1,1,1)
treat <-c(1,1,2,2,1,1,2,2)
y <- c(1,0,1,0,1,0,1,0)
freq <- c(9,17,5,20,10,15,3,20)
itchy <- cbind.data.frame(type,treat,y,freq)
itchy\$type <- as.factor(type)
itchy\$type <- factor(itchy\$type,levels = c(1,2), labels = c("Dark", "Fair"))
itchy\$treat <- as.factor(treat)
itchy\$treat <- factor(itchy\$treat,levels = c(1,2), labels = c("A", "B"))
itchy\$y <- as.factor(y)
itchy\$y <- factor(itchy\$y,levels = c(0,1), labels = c("failure", "succes"))
``````

Now I would like to calculate the ods for a success for treatment A and B when applied to skintype Dark or Fair. (ods = nr of successful events/nr of failures)

I have two questions:

1) Can you help me do the ods calculations by groups?

2) I have tried with various combinations of group_by and subset, without any luck. The below code shows some of my unsuccessful attempts. Can you then tell I have a basic misunderstanding of how the group_by and subset commands work

``````itchy %>% group_by(treat, type) %>% summarize(ods = (subset(freq, y==1)/subset(freq, y==0)))

itchy %>% group_by(treat, type) %>% ods <- c((subset(freq, y==1)/subset(freq, y==0)))

itchy %>% group_by(treat, type) %>% itchy\$ods <- (subset(freq, y==1)/subset(freq, y==0))
`````` R Thomas

If I understand you correctly, I think the following will work. I made use of the the spread function from the tidyr package, which like dplyr is part of the tidyverse

``````library(tidyr)
itchy %>%