RHA RHA - 14 days ago 3
R Question

R: convert long to wide format filling out missing dates

I'm reshaping data from the hour registration of my company, to fit a certain format. I've modified the input to look like this:

employee project month day hours
1 A 16-001 9 9 5
2 B 16-001 9 29 1
3 A 16-001 9 3 5
4 B 16-001 9 28 2
5 A 16-002 9 8 6
6 B 16-002 9 9 4
7 A 16-002 10 25 6
8 B 16-002 10 21 8
9 A overig 10 6 6
10 B overig 10 17 7
11 A overig 10 9 1
12 B overig 10 10 7

#reprodicle data:
df <- data.frame(employee = rep(c("A","B"),6),project=rep(c("16-001","16-002","overig"), each=4), month=rep(c(9,10),each=6),day=sample(1:30,12,replace=T), hours=sample(1:8,12,replace=T))

#Now, I need to move this to a cross table:
res <- ftable(xtabs(hours~month+employee+project+day, aggregate(hours~month+employee+project+day, data=df, FUN=sum)))

#And put this cross table in a data.frame (for export to csv)
library(reshape2)
df_res <- dcast(as.data.frame(res), as.formula(paste(paste(names(attr(res, "row.vars")), collapse="+"), "~", paste(names(attr(res, "col.vars"))))))

df_res

month employee project 3 6 8 9 10 17 21 25 28 29
1 9 A 16-001 5 0 0 5 0 0 0 0 0 0
2 9 A 16-002 0 0 6 0 0 0 0 0 0 0
3 9 A overig 0 0 0 0 0 0 0 0 0 0
4 9 B 16-001 0 0 0 0 0 0 0 0 2 1
5 9 B 16-002 0 0 0 4 0 0 0 0 0 0
6 9 B overig 0 0 0 0 0 0 0 0 0 0
7 10 A 16-001 0 0 0 0 0 0 0 0 0 0
8 10 A 16-002 0 0 0 0 0 0 0 6 0 0
9 10 A overig 0 6 0 1 0 0 0 0 0 0
10 10 B 16-001 0 0 0 0 0 0 0 0 0 0
11 10 B 16-002 0 0 0 0 0 0 8 0 0 0
12 10 B overig 0 0 0 0 7 7 0 0 0 0


I'm not sure this is the best way, but the format is good now. However, I need to have ALL de days as columns, not only the days that were in my data.frame (so 31 columns, preferably with dates that don't exist (like sep 31) with NA and the rest as 0. Any suggestions how to get that?

Answer

I think this is an acceptable solution and it will handle leap years too (for bonus points). Still taking advantage of tidyr::spread()'s nice factor fill behavior with drop = F, but now using the function lubridate::days_in_month() to spread only but so far. Here we go:

library(tidyr)
library(lubridate)
library(purrr)

df$year <- 2016 
df$num_in_month <- ymd(paste(df$year, df$month, df$day)) %>%
    days_in_month()

df %>% split(.$month) %>%
    map(~mutate(., day = factor(day, levels = 1:unique(num_in_month)))) %>%
    map(~spread(., key = day, value = hours, fill = 0, drop = F)) %>%
    bind_rows() %>%
    select(-num_in_month)

  employee project month year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1        A  16-001     9 2016 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  8  0  0 NA
2        A  16-002     9 2016 0 0 0 0 5 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 NA
3        B  16-001     9 2016 0 0 7 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  0  0  0 NA
4        B  16-002     9 2016 0 0 0 0 0 0 0 0 0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 NA
5        A  16-002    10 2016 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6        A  overig    10 2016 0 4 0 0 0 0 0 0 0  5  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7        B  16-002    10 2016 0 0 0 0 0 0 0 0 0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8        B  overig    10 2016 0 0 0 0 6 0 0 0 0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Cheers