dreww2 - 5 months ago 28

R Question

I'm interested in taking a column of a data.frame where the values in the column are pipe delimited and creating dummy variables from the pipe-delimited values.

For example:

Let's say we start with

`df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim|", "Jim|Steve|Ben"))`

> df

a

1 Ben|Chris|Jim

2 Ben|Greg|Jim

3 Jim|Steve|Ben

I'm interested in ending up with:

`df2 = data.frame(Ben = c(1, 1, 1), Chris = c(1, 0, 0), Jim = c(1, 1, 1), Greg = c(0, 1, 0),`

Steve = c(0, 0, 1))

> df2

Ben Chris Jim Greg Steve

1 1 1 1 0 0

2 1 0 1 1 0

3 1 0 1 0 1

I don't know in advance how many potential values there are within the field. In the example above, the variable "a" can include 1 value or 10 values. Assume it is a reasonable number (i.e., < 100 possible values).

Any good ways to do this?

Answer

Another way is using `cSplit_e`

from `splitstackshape`

package.

splitting the dataframe by column `a`

and `fill`

it by 0 and `drop`

the original column.

```
library(splitstackshape)
cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T)
# a_Ben a_Chris a_Greg a_Jim a_Steve
#1 1 1 0 1 0
#2 1 0 1 1 0
#3 1 0 0 1 1
```