Daniel Anderson Daniel Anderson - 1 month ago 21
R Question

tidyr::gather multiple columns of varying types

My question is similar to this question. I'm trying to

tidyr::gather
multiple columns. However, the solution provided in the link is less than ideal because the attributes are generally not identical across all columns and so they are dropped.

Note, I know how to do this with base R, but I'm trying to learn how to do the equivalent operation with tidyr and/or dplyr.

Below I've simulated some data (poorly, but quickly) that illustrate the situation I often find myself in (although I generally have far more columns that follow this same sort of a pattern). I've provided the base solution with
stats::reshape
so you can see my desired output.

Any help would be much appreciated.

set.seed(123)
male_g6 <- rbinom(100, 1, .5)
ell_g6 <- rbinom(100, 1, .1)
sped_g6 <- rbinom(100, 1, .15)
pullouts_g6 <- rbinom(100, 5, .1)
disability_g6 <- replicate(100,
sample(
c("asd", "cd", "ed", "hi", "id", "ohi", "ld", "none"),
1,
prob = c(rep(0.01, 6), 0.05, 0.89)
)
)
score_g6 <- rnorm(100, 200, 10)
score_g7 <- score_g6 + 5 + rnorm(100, 0, 2)
score_g8 <- score_g7 + 5 + rnorm(100, 0, 2)

d <- data.frame(
SID = 1:100,
male_g6 = male_g6,
male_g7 = male_g6,
male_g8 = male_g6,
ell_g6 = ell_g6,
ell_g7 = ell_g6,
ell_g8 = ell_g6,
sped_g6 = sped_g6,
sped_g7 = sped_g6,
sped_g8 = sped_g6,
pullouts_g6 = pullouts_g6,
pullouts_g7 = pullouts_g6,
pullouts_g8 = pullouts_g6,
disability_g6 = disability_g6,
disability_g7 = disability_g6,
disability_g8 = disability_g6,
score_g6 = score_g6,
score_g7 = score_g7,
score_g8 = score_g8
)


With base reshape



ld <- stats::reshape(d,
idvar = "SID",
varying = list(
c("male_g6", "male_g7", "male_g8"),
c("ell_g6", "ell_g7", "ell_g8"),
c("sped_g6", "sped_g7", "sped_g8"),
c("pullouts_g6", "pullouts_g7", "pullouts_g8"),
c("disability_g6", "disability_g7", "disability_g8"),
c("score_g6", "score_g7", "score_g8")
),
v.names = c("male", "ell", "sped", "pullouts", "disability", "score"),
times = 6:8,
timevar = "Grade",
direction = "long"
)
ld <- ld[order(ld$SID), ]

Answer

You'll need to gather beyond what you want to end with so you can separate the grade level from the headers, after which you can spread back to wide form:

ld2 <- d %>% gather(var, val, -SID) %>%     # gather to long form
    # separate grade from variable names
    separate(var, c('var', 'grade'), sep = '_g', convert = TRUE) %>% 
    spread(var, val, convert = TRUE)    # spread back to wide

head(ld2)

##   SID grade disability ell male pullouts    score sped
## 1   1     6         cd   0    0        1 196.2440    0
## 2   1     7         cd   0    0        1 203.2739    0
## 3   1     8         cd   0    0        1 211.1347    0
## 4   2     6       none   0    1        0 194.3812    1
## 5   2     7       none   0    1        0 195.3957    1
## 6   2     8       none   0    1        0 202.4890    1