homer3018 homer3018 - 28 days ago 8
R Question

Split a single column into multiple columns based on a set of values

I have a dataset in R that is comprised of a single column containing variables that I ideally would like in multiple columns. The structure is as follow :

A1
100
200
txt
A2
300
400
txt txt
txt
txt txt txt
A3
150
250
A2
.
.
.


Ideally this is the result I'm chasing :

A1 | 100 | 200 | txt
A2 | 300 | 400 | txt txt | txt | txt txt
A3 | 150 | 250
A2 | . | . | .


The set {A1;A2;A3} is known. The main difficulty I'm hitting right now is that the number of columns is unknown.

I've started by transpose my data, and was thinking doing a loop on the single row, and each time I see one of the value in my set {A1;A2;A3} I start a new row with this value in column 1 so that column 1 only contains {A1;A2;A3} values.

I'm convinced that there is a cleaner way of doing such task.

Thanks ahead of time for your assistance with this!

Answer

Create a grouping variable g and with it use tapply to convert the data from long form to a list, v. Finally, convert each component of v to a "ts" object and cbind the "ts" objects together (since "ts" objects can be bound together and automatically padded with NAs) transposing the result. No packaegs are used.

g <- cumsum(DF[[1]] %in% c("A1", "A2", "A3"))
v <- tapply(DF[[1]], g, c, simplify = FALSE)
m <- t(do.call(cbind, lapply(v, ts)))

giving this matrix:

> m
  [,1] [,2]  [,3]  [,4]      [,5]  [,6]         
1 "A1" "100" "200" "txt "    NA    NA           
2 "A2" "300" "400" "txt txt" "txt" "txt txt txt"
3 "A3" "150" "250" NA        NA    NA           
4 "A2" NA    NA    NA        NA    NA    

or if a data frame is preferred:

DFout<- as.data.frame(m, stringsAsFactors = FALSE)
DFout[] <- lapply(DFout, type.convert, as.is = TRUE)

giving:

> DFout
  V1  V2  V3      V4   V5          V6
1 A1 100 200    txt  <NA>        <NA>
2 A2 300 400 txt txt  txt txt txt txt
3 A3 150 250    <NA> <NA>        <NA>
4 A2  NA  NA    <NA> <NA>        <NA>

Note: The input in reproducible form is:

DF <- structure(list(V1 = c("A1", "100", "200", "txt ", "A2", "300", 
"400", "txt txt", "txt", "txt txt txt", "A3", "150", "250", "A2"
)), .Names = "V1", row.names = c(NA, -14L), class = "data.frame")