Cath Cath - 1 month ago 15
R Question

fuse some information in a vector

Something maybe obvious but I can't seem to see it :

I have a vector like this :

vec<-c("i: 1","n: alpha","a: term1","a: term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4","a: term5","a: term6")


and I need to get this :

out<-c("i: 1","n: alpha","a: term1;term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4;term5;term6")


That is, for each unique
i:
, fuse the
a:
when there are more than one.

I tried with
diff
and
rle
but the resulted code (see below) is too long and I think I'm complicating uselessly the problem...

my code :

out<-vec
a<-which(grepl("^a: ",vec))
diffa<-diff(a)
diffa1<-which(diffa==1)
rle_a<-rle(diffa)$lengths[rle(diffa)$values==1]
indwh<-1
for(ind in 1:length(rle_a)){
allindwh<-indwh:(indwh+rle_a[ind]-1)
out[a[c(diffa1[allindwh],diffa1[allindwh[length(allindwh)]]+1)]]<-paste(out[a[diffa1[allindwh[1]]]],paste(gsub("a: ","",out[a[c(diffa1[allindwh[-1]],diffa1[allindwh[length(allindwh)]]+1)]]),collapse=";"),sep=";")
indwh<-indwh+rle_a[ind]
}
out<-unique(out)


So I get what I want but I would really appreciate any hint to simplify it.

Answer

Here's an easier approach with tapply:

# index of 'a's
idx <- grepl("^a", vec)
# find groups
grp <- c(0, cumsum(diff(idx) < 0))
# apply function to vector based on groups
unlist(tapply(vec, grp, FUN = function(x) 
        c(x[1:2], paste("a:", paste(sub("^a:\\s*", "", x[-(1:2)]), collapse = ";")))),
       use.names = FALSE)

# [1] "i: 1"                 "n: alpha"             "a: term1;term2"      
# [4] "i: 2"                 "n: beta"              "a: term3"            
# [7] "i: 3"                 "n: gamma"             "a: term4;term5;term6"