user3651829 - 6 months ago 23

R Question

I have a matrix of 107 dna sequences (columns) each 10 bases long (rows). I also have a vector of population frequencies called nSamples, and a vector of names for these populations called dnapops. I would like to automatically create a nested list than contains seperately the first 27 sequences as dna1, the next 27 as dna2, the next 17 as dna3.....and so on until all 107 sequences are in their respective population in the list.

This needs to be dynamic as the the number of populations and dna sequences changes from application to application:

$dna1

$dna1$'1'

[1] "g" "t" "g" "a" "t" "t" "c" "c" "g" "g"

$dna1$'2'

[1] "g" "t" "g" "a" "t" "t" "c" "c" "g" "g"

and so on until

$dna1$'27'

[1] "g" "t" "g" "a" "t" "t" "c" "c" "g" "g"

then it goes to dna2 and lists its 27 sequences, then dna3 and lists its 17 sequences.......

`dna <- matrix(data=sample(c("a","g","c","t"),1070,replace=T),nrow=10,ncol=107)`

nSamples <- c(27,27,17,12,1,10,3,1,6,3)

dnapops <- c("dna1","dna2","dna3","dna4","dna5","dna6","dna7","dna8","dna9","dna10")

Answer

We can replicate the sequence of 'nSamples' with the 'nSamples' and `split`

the sequence of columns of 'dna' using that, extract the columns based on the sequence index and `split`

by the `col`

.

```
lst <- lapply(split(seq_len(ncol(dna)),rep(seq_along(nSamples), nSamples)),
function(i) {x1 <- dna[,i, drop=FALSE]
split(x1, col(x1)) })
lengths(lst)
# 1 2 3 4 5 6 7 8 9 10
#27 27 17 12 1 10 3 1 6 3
lst[[1]][1:5]
#$`1`
#[1] "g" "a" "c" "c" "c" "t" "g" "t" "t" "g"
#$`2`
#[1] "c" "g" "c" "c" "g" "t" "a" "a" "c" "a"
#$`3`
#[1] "a" "c" "c" "a" "a" "c" "a" "c" "c" "a"
#$`4`
#[1] "g" "a" "g" "a" "t" "a" "c" "c" "c" "t"
#$`5`
#[1] "g" "g" "g" "a" "a" "a" "g" "g" "g" "g"
```

```
set.seed(24)
dna <- matrix(data=sample(c("a","g","c","t"),1070,replace=T),nrow=10,ncol=107)
```