Starbucks - 2 months ago 8

R Question

I have two variables with multiple levels; V1 has 400 levels and V2 has ≈ 250 levels. How can I transform V2's factors into several different variables and use variable V1 as the unique identifier?

`V1 V2`

Garza, Mike a

Garza, Mike b

Smith, James a

Smith, James f

Smith, James z

Moore, Jen b

Klein, April f

The dataframe should look like the example below. Note: How variables can contain multiple factors, not one variable per factor. Considering Mike has two factors associated with him, factors a and b go into V2 and V3, where Jen, factor b also goes into V2, not V3.

`V1 V2 V3 V4 V5`

Garza, Mike a b

Smith, James a f z

Moore, Jen b

Klein, April f

Any help would be greatly appreciated!

Thank you.

Answer

You can do the first part with `dcast`

in the `reshape`

package and then sort them further to your desired output with `apply`

.

```
dat <- data.frame(V1 = factor(c("Garza", "Garza",
"Smith", "Smith", "Smith",
"Moore", "Klein")),
V2 = c("a","b","a","f","z","b","f"))
# recast your data
dd <- dcast(dat, V1~V2)
#make a function to use with apply
shift_values<- function(x){
notna <-which(!is.na(x[-1]))
val <- x[notna+1]
x[-1] <- c(as.character(val), rep("", (length(x)-1-length(val))))
return(x)
}
# use it in an apply loop, transpose the data, and turn it into a data.frame
result <- data.frame(t(apply(dd, 1, shift_values)))
# change the column names
colnames(result)[-1] <- paste0("V", 2:(ncol(result)))
```

The data then looks like this:

```
V1 V2 V3 V4 V5
1 Garza a b
2 Klein f
3 Moore b
4 Smith a f z
```