CoveredInChocolate CoveredInChocolate - 19 days ago 7
R Question

Concatenating groups of vector character elements

I don't know the proper technical terms for this kind of operation, so it has been difficult to search for existing solutions. I thought I would try to post my own question and hopefully someone can help me out (or point me in the right direction).

I have a vector of characters and I want to collect them in groups of twos and threes. To illustrate, here is a simplified version:

The table I have:


"a"
"b"
"c"
"d"
"e"
"f"


I want to run through the vector and concatenate groups of two and three elements. This is the end result I want:


"a b"
"b c"
"c d"
"d e"
"e f"


And


"a b c"
"b c d"
"c d e"
"d e f"


I solved this the simplest and dirtiest way possible by using for-loops, but it takes a long time to run and I am convinced it can be done more efficiently.

Here is my ghetto-hack:

t1 <- c("a", "b", "c", "d", "e", "f")

t2 <- rep("", length(t1)-1)
for (i in 1:length(t1)-1) {
t2[i] = paste(t1[i], t1[i+1])
}

t3 <- rep("", length(t1)-2)
for (i in 1:length(t1)-2) {
t3[i] = paste(t1[i], t1[i+1], t1[i+2])
}


I was looking into sapply and tapply etc. but I can't seem to figure out how to use "the following element" in the vector.

Any help will be rewarded with my eternal gratitude!

-------------- Edit --------------

Run times of the suggestions using input data with ~ 3 million rows:


START: [1] "2016-11-20 19:24:50 CET"

For-loop: [1] "2016-11-20 19:28:26 CET"

rollapply: [1] "2016-11-20 19:38:55 CET"

apply(matrix): [1] "2016-11-20 19:42:15 CET"

paste t1[-length...]: [1] "2016-11-20 19:42:37 CET"

grep: [1] "2016-11-20 19:44:30 CET"

Answer

For groups of two, we can do this with

paste(t1[-length(t1)], t1[-1])
#[1] "a b" "b c" "c d" "d e" "e f"

and for higher numbers, one option is shift from data.table

library(data.table)
v1 <- do.call(paste, shift(t1, 0:2, type="lead"))
grep("NA", v1, invert=TRUE, value=TRUE)
#[1] "a b c" "b c d" "c d e" "d e f"

Or

n <- length(t1)
n1 <- 3
apply(matrix(t1, ncol=n1, nrow = n+1)[seq(n-(n1-1)),], 1, paste, collapse=' ')