Imlerith - 1 year ago 87
R Question

# extract "N" sized sequences from an array in R

Suppose I have the following array:

``````a <- sample(letters,100,replace=TRUE)
``````

Then suppose those letters are ordered in a sequence, I want to extract all possible 'n' sized sequences from that array. For example:

For
`n=2`
I would do:
`paste0(a[1:99],"->",a[2:100])`

for
`n=3`
I would do:
`paste0(a[1:98],"->",a[2:99],"->",a[3:100])`

you get the point. Now, my goal is to create a function that would take as input
`n`
and would give me back the corresponding set of sequences of the given length from array
`a`

I was able to do it using loops and all that but I was hoping for a high performance one liner.

I am a bit new to R so I'm not aware of all existing functions.

You can use `embed`. For `embed(a, 3)`, this gives a matrix with columns

• `a[3:100]`
• `a[2:99]`
• `a[1:98]`

in that order.

To reverse the column order use matrix syntax `m[rows, cols]`:

``````res = embed(a, 3)[, 3:1]
``````

If you want arrows printed between the columns, then

``````do.call(paste, c(split(res, col(res)), sep = " -> "))
``````

is one way. This is probably better than `apply(res, 1, something)`, performance-wise, since this is vectorized while `apply` would loop over rows.

I was hoping to say something useful about how to find obscure functions in R, but came up mostly blank on how `embed` might be found. Maybe...

1. Go to any HTML help page
2. Click the "Index" hyperlink at the bottom