Ferroao Ferroao - 4 months ago 13
R Question

align strings of a dataframe in columns in r

I would like to acomplish a task (sorry for the edit), but it is beyond my knowledge of R. I have a big data frame, and I want substrings to be aligned in columns, the source dataframe looks like this:

notst stands for other substrings to be ignored

# col1 col2 col3
# notst-s1 notst-s2 notst-x3
# notst-s1 notst-x3 notst-a5
# notst-s2 notst-a5
# notst-x3 notst-a5


The result, should be:

# col1 col2 col3 col4
# notst-s1 notst-s2 notst-x3
# notst-s1 notst-x3 notst-a5
# notst-s2 notst-a5
# notst-x3 notst-a5


Edit:
The answer of akun works for my minimal example (above), but now I have to reconsider, there is (only) one particular suffix string ("spst") in which the whole string ("xxxx-spst") should be used (*).

For:

# col1 col2 col3
# st1-ab stb-spst sta-spst
# stc-spst sta-spst st4-ab
# stb-spst st7-ab
# st9-ba stb-spst


a possible result, could be:

# col1 col2 col3 col4
# st1-ab stb-spst sta-spst
# st4-ab stc-spst sta-spst
# st7-ab stb-spst
# stb-spst st9-ba


(*) Note that in row 2, col2, "stc-spst" seems misplaced, but it is not a problem for me because the value stb-spst does not exist in that row, so for that particular string, only matters the suffix string ("spst")

Answer

We can do this by first melting the dataset, extract the numeric index from the elements, create a row/column index based on that and assign the elements to a matrix created based on the max value of the index.

library(reshape2)
d1 <- na.omit(transform(melt(as.matrix(df1)), v1 = as.numeric(sub("\\D+", "", value))))
m1 <- matrix("", nrow = max(d1$Var1), ncol = max(d1$v1))
m1[as.matrix(d1[c("Var1", "v1")])]  <- as.character(d1$value) 
d2 <- as.data.frame(m1[,!!colSums(m1!="")])
colnames(d2) <- paste0("col", seq_along(d2))
d2
#     col1     col2     col3     col4
#1 notst-s1 notst-s2 notst-x3         
#2 notst-s1          notst-x3 notst-a5
#3          notst-s2          notst-a5
#4                   notst-x3 notst-a5