user_012314112 user_012314112 - 19 days ago 7
R Question

Convert string dataset to a matrix

I have a dataset separated by tab, so I would want to convert the following dataset into a matrix

CATGGGGAAAACTGA
CCTCTCGATCACCGA
CCTATAGATCACCGA
CCGATTGATCACCGA
CCTTGTGCAGACCGA


I used to use

rbind(strsplit("CATGGGGAAAACTGA","")[[1]],
strsplit("CCTCTCGATCACCGA","")[[1]],
strsplit("CCTCTCGATCACCGA","")[[1]],
strsplit("CCTATAGATCACCGA","")[[1]],
strsplit("CCGATTGATCACCGA","")[[1]],
strsplit("CCTTGTGCAGACCGA","")[[1]])


And this produces:

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] "C" "A" "T" "G" "G" "G" "G" "A" "A" "A" "A" "C" "T" "G" "A"
[2,] "C" "C" "T" "C" "T" "C" "G" "A" "T" "C" "A" "C" "C" "G" "A"
[3,] "C" "C" "T" "C" "T" "C" "G" "A" "T" "C" "A" "C" "C" "G" "A"
[4,] "C" "C" "T" "A" "T" "A" "G" "A" "T" "C" "A" "C" "C" "G" "A"
[5,] "C" "C" "G" "A" "T" "T" "G" "A" "T" "C" "A" "C" "C" "G" "A"
[6,] "C" "C" "T" "T" "G" "T" "G" "C" "A" "G" "A" "C" "C" "G" "A"


But when the dataset is very large, this process is exhausting. How could I do it automatically?

Answer

You could use read.fwf to split into single characters:

read.fwf(textConnection("CATGGGGAAAACTGA
CCTCTCGATCACCGA
CCTATAGATCACCGA
CCGATTGATCACCGA
CCTTGTGCAGACCGA"), rep(1, nchar("CATGGGGAAAACTGA")))
#  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
#1  C  A  T  G  G  G  G  A  A   A   A   C   T   G   A
#2  C  C  T  C  T  C  G  A  T   C   A   C   C   G   A
#3  C  C  T  A  T  A  G  A  T   C   A   C   C   G   A
#4  C  C  G  A  T  T  G  A  T   C   A   C   C   G   A
#5  C  C  T  T  G  T  G  C  A   G   A   C   C   G   A

You might want to pass a file name instead of a text connection.