slhck - 2 years ago 77
R Question

# Extract one column as rows with R, preserving other columns

What I have:

I have a data frame that looks like this:

``````sequence foo model output real
1   3     a     12   12
1   3     b     29   12
1   3     c     10   12
1   3     d     38   12
1   3     e     10   12
2   3     a     38   15
2   3     b     10   15
2   3     c     29   15
2   3     d     56   15
2   3     e     10   15
``````

Created by:

``````d.test = data.frame(
sequence = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
foo = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
model = c("a", "b", "c", "d", "e", "a", "b", "c", "d", "e"),
output = c(12, 29, 10, 38, 10, 38, 10, 29, 56, 10),
real = c(12, 12, 12, 12, 12, 15, 15, 15, 15, 15)
)
``````

The model predicts an
`output`
for every given
`sequence`
, but the
`real`
output is also recorded along every sequence.

What I need:

I would like to transform the data such that
`real`
becomes a "model" itself, that is:

``````sequence foo model output
1   3     a     12
1   3     b     29
1   3     c     10
1   3     d     38
1   3     e     10
1   3  real     12
2   3     a     38
2   3     b     10
2   3     c     29
2   3     d     56
2   3     e     10
2   3  real     15
``````

How can I achieve that using
`dplyr`
,
`tidyr`
and their cousins?

Note that for a “nice” solution, one should not have to:

• Manually enter column indices

• Manually specify all the columns like
`foo`
which are not of interest

What I've tried:

I tried the following, but it feels clumsy:

``````unique(
melt(d.test,
id.vars = c("sequence", "foo"),
measure.vars = c("real"),
variable.name = "model",
value.name = "output"
)
)
``````

Now I have to remove the
`real`
column from the original data frame and append the rows of what I just did. It's not a nice solution because apart from the
`foo`
column there may be many more columns that I'd like to preserve, and then I'd have to specify them as
`id.vars`
.

I'd use data.table:

``````library(data.table)
setDT(d.test)

d.test[,
rbind(.SD, .SD[1L][, `:=`(model = "real", output = real[1L])])
, by=sequence][, real := NULL][]
``````

If I had to use the 'verse:

``````d.real = d.test %>% distinct(sequence) %>%
mutate(model = "real", output = real) %>% select(-real)

d = d.test %>% select(-real)
``````

And then stack them:

``````bind_rows(d, d.real)
``````

If the ordering is important, add `%>% arrange(sequence)`.

Comment. The problem in the OP originates with untidy data. Reading Hadley's paper on the subject would probably be helpful if you don't know what I mean.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download