chowching chowching - 4 months ago 19
R Question

Transmuting row entries to columns in R

I'm new here. I have just started learning R.

I have this question:

Suppose I have a dataframe:

name = c("John", "John","John","John","Mark","Mark","Mark","Mark","Dave", "Dave","Dave","Dave")
color = c("red", "blue", "green", "yellow","red", "blue", "green", "yellow","red", "blue", "green", "yellow")
value = c( 1,2,1,3,5,5,3,2,4,6,7,8)
df = data.frame(name, color, value)
#View(df)
df
# name color value
# 1 John red 1
# 2 John blue 2
# 3 John green 1
# 4 John yellow 3
# 5 Mark red 5
# 6 Mark blue 5
# 7 Mark green 3
# 8 Mark yellow 2
# 9 Dave red 4
# 10 Dave blue 6
# 11 Dave green 7
# 12 Dave yellow 8


and I want it to look like this:

# names red blue green yellow
#1 John 1 2 1 3
#2 Mark 5 5 3 2
#3 Dave 4 6 7 8


That is, the entries in the first column (name) will become unique and the levels in the second column (color) will be new columns and the entries that will be in these new columns will come from the corresponding rows in the third column (value) in the original data frame.

I can accomplish this using the following:

library(dplyr)
df = df %>%
group_by(name) %>%
mutate(red = ifelse(color == "red", value, 0.0),
blue = ifelse(color == "blue", value, 0.0),
green = ifelse(color == "green", value, 0.0),
yellow = ifelse(color == "yellow", value, 0.0)) %>%
group_by(name) %>%
summarise_each(funs(sum), red, blue, green, yellow)
df
name red blue green yellow
1 Dave 4 6 7 8
2 John 1 2 1 3
3 Mark 5 5 3 2


But this would not be ideal if there are lots of levels in the color column. How would I go on doing that?

Thank you!

Answer

As the OP is using dplyr family of packages, a nice option is with tidyr

library(tidyr)
spread(df, color, value)
#    name blue green red yellow
#1 Dave    6     7   4      8
#2 John    2     1   1      3
#3 Mark    5     3   5      2

If we need to use %>%

library(dplyr)
df %>% 
    spread(color, value)

To keep the order, we can convert the 'color' to factor class with levels specified as the unique values of 'color' and then do the spread

df %>%
   mutate(color = factor(color, levels = unique(color))) %>%
   spread(color, value)
#  name red blue green yellow
#1 Dave   4    6     7      8
#2 John   1    2     1      3
#3 Mark   5    5     3      2

Or we can use data.table with a faster dcast. Converting to data.table and using the dcast from data.table have advantages. It is much much faster than the dcast from reshape2.

library(data.table)
dcast(setDT(df), name~color, value.var="value")
#   name blue green red yellow
#1: Dave    6     7   4      8
#2: John    2     1   1      3
#3: Mark    5     3   5      2

NOTE: In both the solutions, we get the column names as in the expected output and do not have any uglier suffix or prefix attached to it (which BTW can be changed, but it is another line of code)


If we need a base R, one option is tapply

with(df, tapply(value, list(name, color), FUN = I))
#     blue green red yellow
#Dave    6     7   4      8
#John    2     1   1      3
#Mark    5     3   5      2
Comments