Kevin Moreau Kevin Moreau - 1 year ago 56
R Question

How to determine the average distance between known motif in a list of DNA sequences

So there is my problem : I am searching for the average distance between a known motif inside sequence, and extend this to a list of sequences... The first part is done, the second part (extend to a list of sequences) is the problematic one ! So, here the way i am doing the first part :

source("motifOccurrence.R") #
df <- readDNAStringSet("X.fasta")
df2 <- df[[1]]
motif <- c("T", "C", "C", "A")
coord <- coordMotif(df2, motif)
motidist <- computeDistance(coord)

[1] 152

It's appear that the first sequence of my fasta list have an average distance of 152 nucleotides between two TCCA motifs. And, i don't know how automatize this to all my list in df...

Thanks by advance for the help.


Answer Source

This is untested, but should work. sapply "climbs" each list element (we could also use lapply here).

sapply(df, FUN = function(x, motif) {
  computeDistance(coordMotif(x, motif))
}, motif = motif)

The result will be a vector. If you would like to keep it a list, use sapply(..., simplify = FALSE). Simplification is not done with lapply. Consider either behavior as a convenience. :)