Pierre Laurent Pierre Laurent - 11 days ago 6
R Question

One Hot Encoding using Dummies in R

Need your help for a specific thing i can't hardly explain.
The following code

library(dummies)
columna <- c(1,2,3)
columnb <- c("AR","AT","AF")
columnc <- c("word1", "word2", "word3")
alldata <- data.frame(columna,columnb,columnc)
alldata <- dummy.data.frame(alldata, names=c("columnc"), sep="_")
alldata


is giving me

columna columnb columnc_word1 columnc_word2 columnc_word3
1 1 AR 1 0 0
2 2 AT 0 1 0
3 3 AF 0 0 1


Imagine now, i have

columnc <- c("word1", "word2 word3", "word3 word1")


Can please someone explain me how to obtain ?

columna columnb columnc_word1 columnc_word2 columnc_word3
1 1 AR 1 0 0
2 2 AT 0 1 1
3 3 AF 1 0 1


Regards,

Answer

Here is a tidyverse way:

library(tidyverse)
alldata %>% 
        separate_rows(columnc) %>% mutate(count = 1) %>% 
        spread(columnc, count, fill = 0, sep = "_")

#  columna columnb columnc_word1 columnc_word2 columnc_word3
#1       1      AR             1             0             0
#2       2      AT             0             1             1
#3       3      AF             1             0             1