emehex emehex - 3 months ago 32
R Question

(Un)tidy a dataset with unequal sizes and duplicate variables

I have a dataset that looks like this:

df <- data.frame(
x = c(rep("A", 3), rep("B", 2)),
y = c(1, 2, 6, 8, 3)
)


I need to (un)tidy it so that it looks like this:

df_new <- data.frame(
A = c(1, 2, 6),
B = c(8, 3, NA)
)


tidyr::spread
threw duplicate value errors....

Answer

tidyr (to my knowledge) won't let you do this without an ID column. So we'll add that first and then spread:

library(dplyr)
library(tidyr)

df %>% group_by(x) %>% 
    mutate(id = 1:n()) %>%
    spread(key = x, value = y, fill = NA)
# # A tibble: 3 x 3
#      id     A     B
# * <int> <dbl> <dbl>
# 1     1     1     8
# 2     2     2     3
# 3     3     6    NA

You can, of course, remove the id column at the end if you prefer.