coip coip - 3 months ago 15
R Question

R reshape wide to long, using objects for the inputs

I have a list of data frames in wide format, with a factor variable in Column 1 and annualized data in Columns 2 onward. I want to graph these data. Doing so requires reshaping them into long format. Here is an example of one data frame:

# SAMPLE DATA
x <- structure(list(State = structure(1:3, .Label = c("Alabama", "Alaska", "Arizona", "Arkansas"), class = "factor"), Green.And.Blue.Score.2001 = c(0L, 40L, 65L), Green.And.Blue.Score.2002 = c(20L, 5L, 60L), Green.And.Blue.Score.2003 = c(35L, 15L, 30L)), .Names = c("State", "Green.And.Blue.Score.2001", "Green.And.Blue.Score.2002", "Green.And.Blue.Score.2003"), row.names = c(NA, 3L), class = "data.frame")

x
# State Green.And.Blue.Score.2001 Green.And.Blue.Score.2002 Green.And.Blue.Score.2003
#1 Alabama 0 20 35
#2 Alaska 40 5 15
#3 Arizona 65 60 30


I usually do this using
reshape()
. For instance, this works fine:

# RESHAPE WIDE TO LONG (MANUALLY)
y <- reshape(x,
idvar = 'State',
varying = c('Green.And.Blue.Score.2001', 'Green.And.Blue.Score.2002', 'Green.And.Blue.Score.2003'),
v.names = 'Green.And.Blue.Score.',
times = c('2001', '2002', '2003'),
direction = 'long')

y
# State time Green.And.Blue.Score.
# Alabama 2001 0
# Alaska 2001 40
# Arizona 2001 65
# Alabama 2002 20
# Alaska 2002 5
# Arizona 2002 60
# Alabama 2003 35
# Alaska 2003 15
# Arizona 2003 30


However, I didn't want to manually have to input the
idvar
,
varying
,
v.name
, and
times
variables for the dozens of data frames I have, and because some of the column names are quite long and complex, and vary considerably from data frame to data frame, simple
reshape()
commands can't automatically deparse them. My thought was to create a function to obtain those inputs from the data frame, the precursor of which is as follows:

# RESHAPE WIDE TO LONG (FUNCTIONALIZED)
id <- noquote(paste("'", names(x[1]), "'", sep = ""))
va <- noquote(paste("c('", paste(names(x)[2:length(x)], collapse = "', '"), "')", sep = ""))
vn <- noquote(paste("'", sub("(\\..*)$", ".", names(x)[2]) , "'", sep = ""))
ti <- noquote(paste("c('", paste(sub(".*(\\d{4})$", "\\1", names(x[2:length(x)])), collapse = "', '"), "')", sep = ""))


The output of each of those matches the
idvar
,
varying
,
v.name
, and
times
inputs from
#RESHAPE WIDE TO LONG (MANUALLY)
above:

id
# 'State'
va
# c(''Green.And.Blue.Score.2001', ''Green.And.Blue.Score.2002', ''Green.And.Blue.Score.2003')
vn
# ''Green.And.Blue.Score.'
ti
# c('2001', '2002', '2003')


But, when I try to use those objects in the
reshape()
function, I get an error message:

y <- reshape(x,
idvar = id,
varying = va,
v.names = vn,
times = ti,
direction = 'long')



Error in [.data.frame(data, , varying[[i]][1L]) : undefined
columns selected


I'm sure my solution to 'functionalize'
reshape()
is not ideal. What should I be doing instead?

42- 42-
Answer

The effort to put quotes around the material extracted from names is causing the error. This is a simplification of that code. Note that I removed the v.names and times since these are automagically calculated when the column names are properly separated by "."

y <- reshape(x, 
     idvar = names(x)[1],
   varying = names(x)[-1],

 direction = 'long')
 y
#-----    
           State time Score
Alabama.2001 Alabama 2001     0
Alaska.2001   Alaska 2001    40
Arizona.2001 Arizona 2001    65
Alabama.2002 Alabama 2002    20
Alaska.2002   Alaska 2002     5
Arizona.2002 Arizona 2002    60
Alabama.2003 Alabama 2003    35
Alaska.2003   Alaska 2003    15
Arizona.2003 Arizona 2003    30

If we use this on your new example we can get a "split" at the ".S" that give a reasonable result. The column name text between the first period and the "split" pattern gets moved to the column name while the leading state name and year get appeanded together as a rowname:

 y <- reshape(x, 
      idvar = names(x)[1],
    varying = names(x)[-1],
    split = list(regexp = "\\.S", include = TRUE),
  direction = 'long')
  y

                 State       time Green.And.Blue.
Alabama.Score.2001 Alabama Score.2001               0
Alaska.Score.2001   Alaska Score.2001              40
Arizona.Score.2001 Arizona Score.2001              65
Alabama.Score.2002 Alabama Score.2002              20
Alaska.Score.2002   Alaska Score.2002               5
Arizona.Score.2002 Arizona Score.2002              60
Alabama.Score.2003 Alabama Score.2003              35
Alaska.Score.2003   Alaska Score.2003              15
Arizona.Score.2003 Arizona Score.2003              30