Leeren Leeren - 1 month ago 15
R Question

Using the reshape function in R with multiple matching rows

So I have the following data frame

df
:

X Y Z ID value
1 0 20 135 a 20
2 0 20 135 a 30
3 0 20 135 b 40
4 20 104 20 c 10
5 20 104 20 b 15
...


I want a final output of

X Y Z a b c
1 0 20 135 20 40 null
2 0 20 135 30 null null
3 20 104 20 null 15 10
...


I'd like to turn all the different IDs to individual columns holding the values from the original data frame. What I'm currently doing is
reshape(df, idvar=c("X", "Y", "Z"), timevar="ID", direction="wide")
. However, this results in warnings of having multiple rows match for each of
a
b
and
c
, and thus only the first are taken. I want a row for each unique
X
,
Y
,
Z
combination but since there are many repeated measurements I want multiple rows for each combination as well, but the number of repeated measurements differs for each
a
,
b
, and
c
. Is there any way of doing this effectively?

Answer

You already identify the answer in your question: You need to add a secondary "id" variable. This is easily done with getanID from my "splitstackshape" package. Here's what it does:

library(splitstackshape)
getanID(mydf, c("X", "Y", "Z", "ID"))
#     X   Y   Z ID value .id
# 1:  0  20 135  a    20   1
# 2:  0  20 135  a    30   2
# 3:  0  20 135  b    40   1
# 4: 20 104  20  c    10   1
# 5: 20 104  20  b    15   1

With that in mind, here's how you would change your current reshape approach:

reshape(getanID(mydf, c("X", "Y", "Z", "ID")), 
        direction = "wide", idvar = c("X", "Y", "Z", ".id"), timevar = "ID")
#     X   Y   Z .id value.a value.b value.c
# 1:  0  20 135   1      20      40      NA
# 2:  0  20 135   2      30      NA      NA
# 3: 20 104  20   1      NA      15      10

Or, since "data.table" is also loaded with "splitstackshape", you can use dcast.data.table, like this:

dcast.data.table(getanID(mydf, c("X", "Y", "Z", "ID")),
                 X + Y + Z + .id ~ ID, value.var = "value")
#     X   Y   Z .id  a  b  c
# 1:  0  20 135   1 20 40 NA
# 2:  0  20 135   2 30 NA NA
# 3: 20 104  20   1 NA 15 10
Comments