Cath - 5 months ago 43

R Question

Something maybe obvious but I can't seem to see it :

I have a vector like this :

`vec<-c("i: 1","n: alpha","a: term1","a: term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4","a: term5","a: term6")`

and I need to get this :

`out<-c("i: 1","n: alpha","a: term1;term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4;term5;term6")`

That is, for each unique

`i:`

`a:`

I tried with

`diff`

`rle`

`out<-vec`

a<-which(grepl("^a: ",vec))

diffa<-diff(a)

diffa1<-which(diffa==1)

rle_a<-rle(diffa)$lengths[rle(diffa)$values==1]

indwh<-1

for(ind in 1:length(rle_a)){

allindwh<-indwh:(indwh+rle_a[ind]-1)

out[a[c(diffa1[allindwh],diffa1[allindwh[length(allindwh)]]+1)]]<-paste(out[a[diffa1[allindwh[1]]]],paste(gsub("a: ","",out[a[c(diffa1[allindwh[-1]],diffa1[allindwh[length(allindwh)]]+1)]]),collapse=";"),sep=";")

indwh<-indwh+rle_a[ind]

}

out<-unique(out)

So I get what I want but I would really appreciate any hint to simplify it.

Answer

Here's an easier approach with `tapply`

:

```
# index of 'a's
idx <- grepl("^a", vec)
# find groups
grp <- c(0, cumsum(diff(idx) < 0))
# apply function to vector based on groups
unlist(tapply(vec, grp, FUN = function(x)
c(x[1:2], paste("a:", paste(sub("^a:\\s*", "", x[-(1:2)]), collapse = ";")))),
use.names = FALSE)
# [1] "i: 1" "n: alpha" "a: term1;term2"
# [4] "i: 2" "n: beta" "a: term3"
# [7] "i: 3" "n: gamma" "a: term4;term5;term6"
```