homer3018 -4 years ago 157
R Question

# Split a single column into multiple columns based on a set of values

I have a dataset in R that is comprised of a single column containing variables that I ideally would like in multiple columns. The structure is as follow :

``````A1
100
200
txt
A2
300
400
txt txt
txt
txt txt txt
A3
150
250
A2
.
.
.
``````

Ideally this is the result I'm chasing :

``````A1 | 100 | 200 | txt
A2 | 300 | 400 | txt txt | txt | txt txt
A3 | 150 | 250
A2 |  .  |  .  |  .
``````

The set {A1;A2;A3} is known. The main difficulty I'm hitting right now is that the number of columns is unknown.

I've started by transpose my data, and was thinking doing a loop on the single row, and each time I see one of the value in my set {A1;A2;A3} I start a new row with this value in column 1 so that column 1 only contains {A1;A2;A3} values.

I'm convinced that there is a cleaner way of doing such task.

Create a grouping variable `g` and with it use `tapply` to convert the data from long form to a list, `v`. Finally, convert each component of `v` to a `"ts"` object and `cbind` the `"ts"` objects together (since `"ts"` objects can be bound together and automatically padded with NAs) transposing the result. No packaegs are used.

``````g <- cumsum(DF[[1]] %in% c("A1", "A2", "A3"))
v <- tapply(DF[[1]], g, c, simplify = FALSE)
m <- t(do.call(cbind, lapply(v, ts)))
``````

giving this matrix:

``````> m
[,1] [,2]  [,3]  [,4]      [,5]  [,6]
1 "A1" "100" "200" "txt "    NA    NA
2 "A2" "300" "400" "txt txt" "txt" "txt txt txt"
3 "A3" "150" "250" NA        NA    NA
4 "A2" NA    NA    NA        NA    NA
``````

or if a data frame is preferred:

``````DFout<- as.data.frame(m, stringsAsFactors = FALSE)
DFout[] <- lapply(DFout, type.convert, as.is = TRUE)
``````

giving:

``````> DFout
V1  V2  V3      V4   V5          V6
1 A1 100 200    txt  <NA>        <NA>
2 A2 300 400 txt txt  txt txt txt txt
3 A3 150 250    <NA> <NA>        <NA>
4 A2  NA  NA    <NA> <NA>        <NA>
``````

Note: The input in reproducible form is:

``````DF <- structure(list(V1 = c("A1", "100", "200", "txt ", "A2", "300",
"400", "txt txt", "txt", "txt txt txt", "A3", "150", "250", "A2"
)), .Names = "V1", row.names = c(NA, -14L), class = "data.frame")
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download