Irakli Irakli - 1 month ago 14
R Question

R: reshape a dataframe by creating columns for a specific element (control treatment)

Consider a data frame showing results for the control and two experimental treatments for males and females together with the sizes for each treatment:

library(tidyverse)
mydf <- data_frame( treatment = c('ctrl','low','high','ctrl','low','high'),
gender = c('male','male','male','female','female','female'),
size = c(10,20,30,10,20,30),
result = c(0.11, 0.32, 0.25, 0.15, 0.38, 0.55) )

treatment gender size results
ctrl male 10 0.11
low male 20 0.32
high male 30 0.25
ctrl female 10 0.15
low female 20 0.35
high female 30 0.55


To compare the control with experimental treatments side by side, I'd like to reshape the dataframe as follows:

treatment gender ctrl_size size ctrl_result result
low female 10 20 0.15 0.38
high female 10 30 0.15 0.55
low male 10 20 0.11 0.32
high male 10 30 0.11 0.25


My attempt below works but seems cumbersome to me as it creates auxiliary data frames before merging them into the final one:

mydf_result <- mydf %>%
select(-size) %>%
spread(treatment, result) %>%
gather(treatment, result, c(low, high) ) %>%
rename(ctrl_result = ctrl)

mydf_size <- mydf %>%
select(-result) %>%
spread(treatment, size) %>%
gather(treatment, size, c(low, high)) %>%
rename(ctrl_size = ctrl)

mydf_final <-
full_join(mydf_result, mydf_size, by = c('treatment', 'gender')) %>%
select(treatment, gender, ctrl_size, size, ctrl_result, result) %>%
arrange(gender)

# A tibble: 4 × 6
treatment gender ctrl_size size ctrl_result result
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 low female 10 20 0.15 0.38
2 high female 10 30 0.15 0.55
3 low male 10 20 0.11 0.32
4 high male 10 30 0.11 0.25


Can the above be achieved within one pipeline?

Answer

While I'm not sure the desired result is the tidiest arrangement, you can rearrange like so:

library(tidyverse)

mydf %>% gather(var, val, size, result) %>%    # gather all numbers into one column
    spread(treatment, val) %>%    # spread treatment so ctrl can be separated
    gather(treatment, ttmt, high, low) %>%    # regather high and low separately
    gather(ct_tm, val, ctrl, ttmt) %>%    # regather numbers, now with ctrl/ttmt separated
    unite(var, ct_tm, var) %>%    # join column labels
    spread(var, val)    # spread to wide

## # A tibble: 4 × 6
##   gender treatment ctrl_result ctrl_size ttmt_result ttmt_size
## *  <chr>     <chr>       <dbl>     <dbl>       <dbl>     <dbl>
## 1 female      high        0.15        10        0.55        30
## 2 female       low        0.15        10        0.38        20
## 3   male      high        0.11        10        0.25        30
## 4   male       low        0.11        10        0.32        20
Comments