CoveredInChocolate - 8 months ago 48

R Question

I don't know the proper technical terms for this kind of operation, so it has been difficult to search for existing solutions. I thought I would try to post my own question and hopefully someone can help me out (or point me in the right direction).

I have a vector of characters and I want to collect them in groups of twos and threes. To illustrate, here is a simplified version:

The table I have:

"a"

"b"

"c"

"d"

"e"

"f"

I want to run through the vector and concatenate groups of two and three elements. This is the end result I want:

"a b"

"b c"

"c d"

"d e"

"e f"

And

"a b c"

"b c d"

"c d e"

"d e f"

I solved this the simplest and dirtiest way possible by using for-loops, but it takes a long time to run and I am convinced it can be done more efficiently.

Here is my ghetto-hack:

`t1 <- c("a", "b", "c", "d", "e", "f")`

t2 <- rep("", length(t1)-1)

for (i in 1:length(t1)-1) {

t2[i] = paste(t1[i], t1[i+1])

}

t3 <- rep("", length(t1)-2)

for (i in 1:length(t1)-2) {

t3[i] = paste(t1[i], t1[i+1], t1[i+2])

}

I was looking into sapply and tapply etc. but I can't seem to figure out how to use "the following element" in the vector.

Any help will be rewarded with my eternal gratitude!

Run times of the suggestions using input data with ~ 3 million rows:

START: [1] "2016-11-20 19:24:50 CET"

For-loop: [1] "2016-11-20 19:28:26 CET"

rollapply: [1] "2016-11-20 19:38:55 CET"

apply(matrix): [1] "2016-11-20 19:42:15 CET"

paste t1[-length...]: [1] "2016-11-20 19:42:37 CET"

grep: [1] "2016-11-20 19:44:30 CET"

Answer Source

For groups of two, we can do this with

```
paste(t1[-length(t1)], t1[-1])
#[1] "a b" "b c" "c d" "d e" "e f"
```

and for higher numbers, one option is `shift`

from `data.table`

```
library(data.table)
v1 <- do.call(paste, shift(t1, 0:2, type="lead"))
grep("NA", v1, invert=TRUE, value=TRUE)
#[1] "a b c" "b c d" "c d e" "d e f"
```

Or

```
n <- length(t1)
n1 <- 3
apply(matrix(t1, ncol=n1, nrow = n+1)[seq(n-(n1-1)),], 1, paste, collapse=' ')
```