dreww2 dreww2 - 2 months ago 14
R Question

Converting column with pipe delimited data into dummy variables

I'm interested in taking a column of a data.frame where the values in the column are pipe delimited and creating dummy variables from the pipe-delimited values.

For example:

Let's say we start with

df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim|", "Jim|Steve|Ben"))

> df
a
1 Ben|Chris|Jim
2 Ben|Greg|Jim
3 Jim|Steve|Ben


I'm interested in ending up with:

df2 = data.frame(Ben = c(1, 1, 1), Chris = c(1, 0, 0), Jim = c(1, 1, 1), Greg = c(0, 1, 0),
Steve = c(0, 0, 1))
> df2
Ben Chris Jim Greg Steve
1 1 1 1 0 0
2 1 0 1 1 0
3 1 0 1 0 1


I don't know in advance how many potential values there are within the field. In the example above, the variable "a" can include 1 value or 10 values. Assume it is a reasonable number (i.e., < 100 possible values).

Any good ways to do this?

Answer

Another way is using cSplit_e from splitstackshape package.

splitting the dataframe by column a and fill it by 0 and drop the original column.

library(splitstackshape)
cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T)

#   a_Ben a_Chris a_Greg a_Jim a_Steve
#1     1       1      0     1       0
#2     1       0      1     1       0
#3     1       0      0     1       1