homer3018 - 28 days ago 8

R Question

I have a dataset in R that is comprised of a single column containing variables that I ideally would like in multiple columns. The structure is as follow :

`A1`

100

200

txt

A2

300

400

txt txt

txt

txt txt txt

A3

150

250

A2

.

.

.

Ideally this is the result I'm chasing :

`A1 | 100 | 200 | txt`

A2 | 300 | 400 | txt txt | txt | txt txt

A3 | 150 | 250

A2 | . | . | .

The set {A1;A2;A3} is known. The main difficulty I'm hitting right now is that the number of columns is unknown.

I've started by transpose my data, and was thinking doing a loop on the single row, and each time I see one of the value in my set {A1;A2;A3} I start a new row with this value in column 1 so that column 1 only contains {A1;A2;A3} values.

I'm convinced that there is a cleaner way of doing such task.

Thanks ahead of time for your assistance with this!

Answer

Create a grouping variable `g`

and with it use `tapply`

to convert the data from long form to a list, `v`

. Finally, convert each component of `v`

to a `"ts"`

object and `cbind`

the `"ts"`

objects together (since `"ts"`

objects can be bound together and automatically padded with NAs) transposing the result. No packaegs are used.

```
g <- cumsum(DF[[1]] %in% c("A1", "A2", "A3"))
v <- tapply(DF[[1]], g, c, simplify = FALSE)
m <- t(do.call(cbind, lapply(v, ts)))
```

giving this matrix:

```
> m
[,1] [,2] [,3] [,4] [,5] [,6]
1 "A1" "100" "200" "txt " NA NA
2 "A2" "300" "400" "txt txt" "txt" "txt txt txt"
3 "A3" "150" "250" NA NA NA
4 "A2" NA NA NA NA NA
```

or if a data frame is preferred:

```
DFout<- as.data.frame(m, stringsAsFactors = FALSE)
DFout[] <- lapply(DFout, type.convert, as.is = TRUE)
```

giving:

```
> DFout
V1 V2 V3 V4 V5 V6
1 A1 100 200 txt <NA> <NA>
2 A2 300 400 txt txt txt txt txt txt
3 A3 150 250 <NA> <NA> <NA>
4 A2 NA NA <NA> <NA> <NA>
```

**Note:** The input in reproducible form is:

```
DF <- structure(list(V1 = c("A1", "100", "200", "txt ", "A2", "300",
"400", "txt txt", "txt", "txt txt txt", "A3", "150", "250", "A2"
)), .Names = "V1", row.names = c(NA, -14L), class = "data.frame")
```