SHW SHW - 1 month ago 13
R Question

Reshape dataframe from long to wide using merge function grouped by two factors

I want to reshape a long dataframe to a wide dataframe by two grouping variables (resp & company) and three numerical respons variables (quality, amount, sense). I tried to do it with the dcast function, but it doesn't allow me to group by two variables. Can anyone help me out?

#Current long dataframe: two grouping variables (resp & company), three numerical respons variables (Quality, Amount, Sense)
resp <- c(1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109,1325851107,1325851108,1325851109)
company <- c("Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Dark.nl","Manual.nl","Manual.nl","Manual.nl","Dark.nl","Dark.nl","Dark.nl")
question <- c("Quality","Quality","Quality","Amount","Amount","Amount","Quality","Quality","Quality","Sense","Sense","Sense")
score <- c(4,1,2,6,8,10,5,5,7,4,6,7)
current <- data.frame(resp,company,question,score,answer); current

#Desired wide dataframe
resp2 <- c(1325851107,1325851107,1325851108,1325851108,1325851109,1325851109)
company2 <- c("Dark.nl","Manual.nl","Dark.nl","Manual.nl","Dark.nl","Manual.nl")
Quality <- c(4,5,1,5,2,7)
Amount <- c(6,NA,8,NA,10,NA)
Sense <- c(4,NA,6,NA,7,NA)
desired <- data.frame(resp2,company2,Quality,Amount,Sense); desired

#Using dcast function to reshape
library("reshape2")
dcast(current, resp + company ~ question, value.var="score")


The merge function that Parfait provided works. I provide here the script that did the trick (thank you Parfait ;)).

cols2keep <- c("resp", "company", "score")
df <- merge(current[current$question=='Quality', cols2keep], #merge two dataframes
current[current$question=='Amount', cols2keep],
by=c("resp", "company"), all=TRUE)

df <- merge(df,current[current$question=='Sense', c("resp","company","score")], #merge third respons variable into new dataframe
by=c("resp", "company"), all=TRUE)
colnames(df) <- c("resp","company","quality","amount","sense")


This solution works, but my real dataset exists of 53 respons variables. This method therefore is quite time-consuming. I tried the iterative approach of Parfait, but I get the following error.

dfList <- lapply(unique(current$question), function(i){
temp <- setNames(current[current$question==i, c("resp", "company", "score")],
c("resp", "company", paste0(i)))
})

finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList)
Error in fix.by(by.x, x) :
'by' must specify one or more columns as numbers, names or logical


I am relatively new to coding in R and can't grasp what I have written wrong. I am happy with the solution I have now, but if there are more efficient solutions I am open to it.

Answer

Consider a merge on filtered subsets:

cols2keep <- c("resp", "company", "score", "answer")

df <- merge(current[current$question=='Quality', cols2keep],
            current[current$question=='Amount', cols2keep],
            by=c("resp", "company"), all=TRUE)

colnames(df) <- c("resp", "company", "quality", "quality_a", "amount", "amount_a")    
df

#         resp   company quality     quality_a amount amount_a
# 1 1325851107   Dark.nl       4   Didn't like      6    Maybe
# 2 1325851107 Manual.nl       5          Fine     NA     <NA>
# 3 1325851108   Dark.nl       1        Was ok      8     Fine
# 4 1325851108 Manual.nl       5 No, thank you     NA     <NA>
# 5 1325851109   Dark.nl       2          Sure     10  Not bad
# 6 1325851109 Manual.nl       7       Why not     NA     <NA>

For multiple groups such as Sense, continue to merge with a filtered subset:

df <- merge(df, 
            current[current$question=='Sense',c("resp", "company", "score", "answer")],
            by=c("resp", "company"), all=TRUE)

colnames(df) <- c("resp", "company", "quality", "quality_a", "amount", "amount_a", 
                  "sense", "sense_a")
df
#         resp   company quality     quality_a amount amount_a sense sense_a
# 1 1325851107   Dark.nl       4   Didn't like      6    Maybe     4    Nice
# 2 1325851107 Manual.nl       5          Fine     NA     <NA>    NA    <NA>
# 3 1325851108   Dark.nl       1        Was ok      8     Fine     6      Ok
# 4 1325851108 Manual.nl       5 No, thank you     NA     <NA>    NA    <NA>
# 5 1325851109   Dark.nl       2          Sure     10  Not bad     7     Yuk
# 6 1325851109 Manual.nl       7       Why not     NA     <NA>    NA    <NA>

Furthermore, for an iterative merge across all levels of question, consider the following:

dfList <- lapply(unique(current$question), function(i){
  temp <- setNames(current[current$question==i, c("resp", "company", "score", "answer")],
                 c("resp", "company", paste0(i), paste0(i, "_a")))
})

finaldf <- Reduce(function(...) merge(..., y=c("resp", "company"), all=T), dfList)
finaldf
#         resp   company Quality     Quality_a Amount Amount_a Sense Sense_a
# 1 1325851107   Dark.nl       4   Didn't like      6    Maybe     4    Nice
# 2 1325851107 Manual.nl       5          Fine     NA     <NA>    NA    <NA>
# 3 1325851108   Dark.nl       1        Was ok      8     Fine     6      Ok
# 4 1325851108 Manual.nl       5 No, thank you     NA     <NA>    NA    <NA>
# 5 1325851109   Dark.nl       2          Sure     10  Not bad     7     Yuk
# 6 1325851109 Manual.nl       7       Why not     NA     <NA>    NA    <NA>
Comments