Ferroao - 1 year ago 39

R Question

I would like to acomplish a task (sorry for the edit), but it is beyond my knowledge of R. I have a big data frame, and I want substrings to be aligned in columns, the source dataframe looks like this:

notst stands for other substrings to be ignored

`# col1 col2 col3`

# notst-s1 notst-s2 notst-x3

# notst-s1 notst-x3 notst-a5

# notst-s2 notst-a5

# notst-x3 notst-a5

The result, should be:

`# col1 col2 col3 col4`

# notst-s1 notst-s2 notst-x3

# notst-s1 notst-x3 notst-a5

# notst-s2 notst-a5

# notst-x3 notst-a5

Edit:

The answer of akun works for my minimal example (above), but now I have to reconsider, there is (only) one particular suffix string ("spst") in which the whole string ("xxxx-spst") should be used (*).

For:

`# col1 col2 col3`

# st1-ab stb-spst sta-spst

# stc-spst sta-spst st4-ab

# stb-spst st7-ab

# st9-ba stb-spst

a possible result, could be:

`# col1 col2 col3 col4`

# st1-ab stb-spst sta-spst

# st4-ab stc-spst sta-spst

# st7-ab stb-spst

# stb-spst st9-ba

(*) Note that in row 2, col2, "stc-spst" seems misplaced, but it is not a problem for me because the value stb-spst does not exist in that row, so for that particular string, only matters the suffix string ("spst")

Answer Source

We can do this by first `melt`

ing the dataset, extract the numeric index from the elements, create a row/column index based on that and assign the elements to a `matrix`

created based on the max value of the index.

```
library(reshape2)
d1 <- na.omit(transform(melt(as.matrix(df1)), v1 = as.numeric(sub("\\D+", "", value))))
m1 <- matrix("", nrow = max(d1$Var1), ncol = max(d1$v1))
m1[as.matrix(d1[c("Var1", "v1")])] <- as.character(d1$value)
d2 <- as.data.frame(m1[,!!colSums(m1!="")])
colnames(d2) <- paste0("col", seq_along(d2))
d2
# col1 col2 col3 col4
#1 notst-s1 notst-s2 notst-x3
#2 notst-s1 notst-x3 notst-a5
#3 notst-s2 notst-a5
#4 notst-x3 notst-a5
```