Imlerith Imlerith - 3 months ago 10
R Question

extract "N" sized sequences from an array in R

Suppose I have the following array:

a <- sample(letters,100,replace=TRUE)


Then suppose those letters are ordered in a sequence, I want to extract all possible 'n' sized sequences from that array. For example:

For
n=2
I would do:
paste0(a[1:99],"->",a[2:100])


for
n=3
I would do:
paste0(a[1:98],"->",a[2:99],"->",a[3:100])


you get the point. Now, my goal is to create a function that would take as input
n
and would give me back the corresponding set of sequences of the given length from array
a


I was able to do it using loops and all that but I was hoping for a high performance one liner.

I am a bit new to R so I'm not aware of all existing functions.

Answer

You can use embed. For embed(a, 3), this gives a matrix with columns

  • a[3:100]
  • a[2:99]
  • a[1:98]

in that order.

To reverse the column order use matrix syntax m[rows, cols]:

res = embed(a, 3)[, 3:1]

If you want arrows printed between the columns, then

do.call(paste, c(split(res, col(res)), sep = " -> "))

is one way. This is probably better than apply(res, 1, something), performance-wise, since this is vectorized while apply would loop over rows.


I was hoping to say something useful about how to find obscure functions in R, but came up mostly blank on how embed might be found. Maybe...

  1. Go to any HTML help page
  2. Click the "Index" hyperlink at the bottom
  3. Read every single page

?