Rocky Rocky - 6 days ago 4
R Question

Reshaping data in R (wide -> Long)

I want to convert df->df2

Old sample data frame df1

df1 <- structure(list(ID = 1:2, Group = c(1L, 1L), M1a2hB = c(0.2, 0.3), M1a3hB = c(0.4, 0.6), M2a2hB = c(0.3, 0.4), M2a3hB = c(0.6, 0.6), M1r2hB = c(200L, 300L), M1r3hB = c(400L, 600L), M2r2hB = c(300L, 400L), M2r3hB = c(600L, 600L)), .Names = c("ID", "Group", "M1a2hB", "M1a3hB", "M2a2hB", "M2a3hB","M1r2hB", "M1r3hB","M2r2hB", "M2r3hB"), class = "data.frame", row.names = c(NA, -2L))


ID Group M1a2hB M1a3hB M2a2hB M2a3hB.... M1r2hB M1r3hB M2r2hB M2r3hB ...
1 1 0.2 0.4 0.3 0.6 ... 200 400 300 600 ...
2 1 0.3 0.6 0.4 0.6 ... 300 600 400 600 ...


Here, df has 100 IDs and 1100 Columns. Each outcome measure has two columns for absolute change and two for relative change. There are nearly 270 outcome measures.

M1a2hB is absolute change in first measure from time 2 to baseline and M1a3hB is absolute change for time 3 to baseline. Similarly, M1r2hB is relative change in first outcome from time 2 to baseline and M1r3hB is relative change in an outcome from time 3 to baseline.

New Df2:

ID Group time M1a M2a ... M1r M2r ...
1 1 1 0.0 0.0 ... 000 000 ...
1 1 2 0.2 0.3 ... 200 300 ...
1 1 3 0.4 0.6 ... 400 600 ...
2 1 1 0.0 0.0 ... 000 000 ...
2 1 2 0.3 0.4 ... 300 400 ...
2 1 3 0.6 0.6 ... 600 600 ...


Any tips? Feel free to ask for any clarification. Thanks! Looking forward!

p.s. I have tried to run few codes from previous posts (pls see below if interested), but they seem different because df is three dimensional data, and df2 includes additional time column

In R, plotting wide form data with ggplot2 or base plot. Is there a way to use ggplot2 without melting wide form data frame?

Reshaping repeated measures data in R wide to long

Answer

We can extract the patterns from the column names using sub, split the sequence of that vector with 'nm1', use that as measure in melt to convert from 'wide' to 'long' format.

library(data.table)
nm1 <- sub("\\d+[[:alpha:]]+$", '', names(df1)[-(1:2)])
lst <- split(seq_along(nm1)+2, nm1)
melt(setDT(df1), measure = lst, 
       value.name= names(lst), variable.name= 'time')[order(ID)]
#   ID Group time M1a M1r M2a M2r
#1:  1     1    1 0.2 200 0.3 300
#2:  1     1    2 0.4 400 0.6 600
#3:  2     1    1 0.3 300 0.4 400
#4:  2     1    2 0.6 600 0.6 600

data

df1 <- structure(list(ID = 1:2, Group = c(1L, 1L),
  M1a2hB = c(0.2, 0.3
), M1a3hB = c(0.4, 0.6), M2a2hB = c(0.3, 0.4),
 M2a3hB = c(0.6, 
0.6), M1r2hB = c(200L, 300L), M1r3hB = c(400L, 600L), 
M2r2hB = c(300L, 
400L), M2r3hB = c(600L, 600L)), .Names = c("ID", "Group", "M1a2hB", 
"M1a3hB", "M2a2hB", "M2a3hB", "M1r2hB", "M1r3hB",
"M2r2hB", "M2r3hB"
), class = "data.frame", row.names = c(NA, -2L))