nik nik - 1 month ago 7
R Question

split strings and rearrange a data frame

I have a data like this one

df <- structure(list(A = structure(c(2L, 3L, 6L, 7L, 5L, 4L, 1L, 1L
), .Label = c("", "NZT1", "O749", "P42I;QJ0;AIH2", "P609;QT7",
"Q835", "Q854"), class = "factor"), B = structure(c(8L, 6L, 5L,
7L, 4L, 3L, 2L, 1L), .Label = c("", "P079;P0C7;P0C8", "P641;Q614",
"Q013", "Q554", "Q749", "Q955", "Q9U0"), class = "factor"), C = structure(c(7L,
8L, 6L, 5L, 3L, 4L, 1L, 2L), .Label = c("P641;QS14", "P679;P0C7;P048",
"Q168", "Q413", "Q550", "Q6N9", "Q980", "Q997"), class = "factor")), .Names = c("A",
"B", "C"), class = "data.frame", row.names = c(NA, -8L))

# A B C
#1 NZT1 Q9U0 Q980
#2 O749 Q749 Q997
#3 Q835 Q554 Q6N9
#4 Q854 Q955 Q550
#5 P609;QT7 Q013 Q168
#6 P42I;QJ0;AIH2 P641;Q614 Q413
#7 P079;P0C7;P0C8 P641;QS14
#8 P679;P0C7;P048


I am trying to split them based on ";", and then put them under the other string , the expected output I seek is like this

# A B C
#1 NZT1 Q9U0 Q980
#2 O749 Q749 Q997
#3 Q835 Q554 Q6N9
#4 Q854 Q955 Q550
#5 P609 Q013 Q168
#6 QT7 P641 Q413
#7 P42I Q614 P641
#8 QJ0 P079 QS14
#9 AIH2 P0C7 P679
#10 P0C8 P0C7
#11 P048


I tried to use
strsplit()
but I did not get that far

This is what I tried

myNewdf <- strsplit(as.character(unlist(df)), ";")

Answer

I think you can try this:

x <- lapply(df, function (x) unlist(strsplit(as.character(x), ";")))

This gives you a list. If you want a data frame, you need some further work to pad empty string "":

m <- max(lengths(x))
y <- as.data.frame(lapply(x, function (vec) c(vec, character(m - length(vec)))))

#       A    B    C
# 1  NZT1 Q9U0 Q980
# 2  O749 Q749 Q997
# 3  Q835 Q554 Q6N9
# 4  Q854 Q955 Q550
# 5  P609 Q013 Q168
# 6   QT7 P641 Q413
# 7  P42I Q614 P641
# 8   QJ0 P079 QS14
# 9  AIH2 P0C7 P679
# 10      P0C8 P0C7
# 11           P048
Comments