Starbucks Starbucks - 23 days ago 7
R Question

R Converting Factors into New Variables

I have two variables with multiple levels; V1 has 400 levels and V2 has ≈ 250 levels. How can I transform V2's factors into several different variables and use variable V1 as the unique identifier?

V1 V2
Garza, Mike a
Garza, Mike b
Smith, James a
Smith, James f
Smith, James z
Moore, Jen b
Klein, April f


The dataframe should look like the example below. Note: How variables can contain multiple factors, not one variable per factor. Considering Mike has two factors associated with him, factors a and b go into V2 and V3, where Jen, factor b also goes into V2, not V3.

V1 V2 V3 V4 V5
Garza, Mike a b
Smith, James a f z
Moore, Jen b
Klein, April f


Any help would be greatly appreciated!

Thank you.

Answer

You can do the first part with dcast in the reshape package and then sort them further to your desired output with apply.

dat <- data.frame(V1 = factor(c("Garza", "Garza",
                          "Smith", "Smith", "Smith",
                          "Moore", "Klein")),
                  V2 = c("a","b","a","f","z","b","f"))

# recast your data
dd <- dcast(dat, V1~V2)

#make a function to use with apply

shift_values<- function(x){
  notna <-which(!is.na(x[-1]))
  val <- x[notna+1]
  x[-1] <- c(as.character(val), rep("", (length(x)-1-length(val))))
  return(x)
}

# use it in an apply loop, transpose the data, and turn it into a data.frame
result <- data.frame(t(apply(dd, 1, shift_values)))

# change the column names
colnames(result)[-1] <- paste0("V", 2:(ncol(result)))

The data then looks like this:

     V1 V2 V3 V4 V5
1 Garza  a  b      
2 Klein  f         
3 Moore  b         
4 Smith  a  f  z