Mako212 Mako212 - 3 years ago 94
R Question

R: Gather repeated columns

Sample data:

df1 <- structure(list(Name = structure(c(3L, 2L, 1L), .Label = c("Bob",
"Joe", "Mike"), class = "factor"), Location = structure(c(1L,
1L, 2L), .Label = c("CA", "WA"), class = "factor"), Title = structure(c(2L,
3L, 1L), .Label = c("CEO", "Manager", "VP"), class = "factor"),
Class = structure(c(1L, 2L, 2L), .Label = c("Class1", "Class2"
), class = "factor"), Month = c(1, 2, 3), Class.1 = structure(c(3L,
2L, 1L), .Label = c("Class1", "Class2", "Class4"), class = "factor"),
Month.1 = c(3, 3, 2), Objective = structure(1:3, .Label = c("Obj1",
"Obj2", "Obj3"), class = "factor"), Month.2 = c(2, 7, 7),
Category = c("x", "y", "z"), Objective.1 = structure(c(3L,
2L, 1L), .Label = c("Obj1", "Obj7", "Obj9"), class = "factor"),
Month.3 = c(4, 5, 5), Category2 = c("z", "r", "q")), .Names = c("Name",
"Location", "Title", "Class", "Month", "Class.1", "Month.1",
"Objective", "Month.2", "Category", "Objective.1", "Month.3",
"Category2"), class = "data.frame", row.names = c(NA, -3L))

Name Location Title Class Month Class.1 Month.1 Objective Month.2 Category Objective.1 Month.3 Category2
1 Mike CA Manager Class1 1 Class4 3 Obj1 2 x Obj9 4 z
2 Joe CA VP Class2 2 Class2 3 Obj2 7 y Obj7 5 r
3 Bob WA CEO Class2 3 Class1 2 Obj3 7 z Obj1 5 q


I want to gather into the form with one row per observation:

Name Location Title Variable(Class/Objective) Value


I've tried some of the similar examples on stack using
gather
,
spread
, etc., but I can't figure out how to keep the Class-Month and Objective-Month groups together.

In my real data set, there are 100 columns with 8 ID columns. And instead of just Class-Month, or Objective-Month pairs, the first half of the columns are in groups of four, and the second half are in groups of 8. An example of a group of four would be Class-Month-Cost-Date.

Sample output for Mike:

Name Location Title Variable Value Value.2
1 Mike CA Manager Class1 1 <NA>
2 Mike CA Manager Class4 3 <NA>
3 Mike CA Manager Obj1 2 x
4 Mike CA Manager Obj9 4 z

Answer Source

Repeated values are fine, but you'll need to specify which ones group together (in the example, "Class" and "Objective") to get the OP's output:

library(data.table)
melt(setDT(df1), 
  meas = patterns("Class|Objective", "Month", "Category")
)[order(Name)]

    Name Location   Title variable value1 value2 value3
 1:  Bob       WA     CEO        1 Class2      3      z
 2:  Bob       WA     CEO        2 Class1      2      q
 3:  Bob       WA     CEO        3   Obj3      7     NA
 4:  Bob       WA     CEO        4   Obj1      5     NA
 5:  Joe       CA      VP        1 Class2      2      y
 6:  Joe       CA      VP        2 Class2      3      r
 7:  Joe       CA      VP        3   Obj2      7     NA
 8:  Joe       CA      VP        4   Obj7      5     NA
 9: Mike       CA Manager        1 Class1      1      x
10: Mike       CA Manager        2 Class4      3      z
11: Mike       CA Manager        3   Obj1      2     NA
12: Mike       CA Manager        4   Obj9      4     NA

It won't matter if you have identically repeating column names or use check.names=TRUE to disambiguate since patterns just matches patterns in the names. See ?regex for more on how to specify patterns if needed.

Other arguments to melt (see ?melt.data.table) can be used to give customized names to the columns in the result (instead of "value1", "value2", ...).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download