agenis - 10 months ago 41

R Question

I have a vector of numbers. For instance like this, with only a few unique values:

`set.seed(2)`

a = rpois(1000, 0.3)

head(a, 20)

#### [1] 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 1 2 0 0 0

Now what I need is to find for each number,

`all.diff = function(num) NROW(unique(num))==NROW(num)`

Then I came up with a

`for`

`ConsecutiveDifferent = function(vector) {`

output = numeric(NROW(vector)-2)

for (i in 2:(NROW(vector)-1) ) {

trio <- c(vector[i-1], vector[i], vector[i+1])

if ( all.diff(trio) ) output[i]<-1

}

return(output)

}

res = ConsecutiveDifferent(a)

head(res, 20)

#### [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0

It does the job but since my vector has a length of several hundred millions, I was wondering if there was a better way to do this than a loop.

Thanks,

Thanks for getting me so many solutions! I coulnd't decide whose answer has to be accepted so I did a microbenckmark (length=50000) and the prize goes to Franck.. Also thanks for the extensive answer.

Answer

**rle.** This is very particular to the case of trios:

```
w = with(rle(a), cumsum(lengths)[
lengths == 1L & c(NA, values[-length(values)]) != c(values[-1], NA)
])
res2 = c(NA, logical(length(a)-2), NA)
res2[w] = TRUE
identical(res, res2) # TRUE
```

**combn.** I might do

```
a_shift = list(c(NA, a[-length(a)]), a, c(a[-1], NA))
n_distinct = rowSums(combn(a_shift, 2, FUN = function(x) x[[1]] != x[[2]]))
res = n_distinct == length(a_shift)
```

To examine whether it worked...

```
head(cbind.data.frame(a, res), 20)
a res
1 0 NA
2 0 FALSE
3 0 FALSE
4 0 FALSE
5 1 FALSE
6 1 FALSE
7 0 FALSE
8 1 FALSE
9 0 FALSE
10 0 FALSE
11 0 FALSE
12 0 FALSE
13 1 FALSE
14 0 FALSE
15 0 FALSE
16 1 TRUE
17 2 TRUE
18 0 FALSE
19 0 FALSE
20 0 FALSE
```

This can be extended to looking further ahead and behind by extending `a_shift`

, which can be easily done with the `shift`

function from data.table:

```
library(data.table)
n_back = 1
n_fwd = 1
a_shift = setDT(list(a))[, c(
shift(V1, n_back:0, type="lag"),
list(shift(V1, n_fwd, type="lead"))
)]
a_shift[, r := .I]
resDT = melt(a_shift, id = "r")[, .(res =
if (any(is.na(value))) NA else uniqueN(value) == n_fwd + n_back + 1L
), by=r][, a := a]
identical(res, resDT$res) # TRUE
```

... which may look arcane, but that's more to do with my coding style than the package.

Source (Stackoverflow)