Terry Terry - 2 months ago 6
R Question

Assign id by cluster in R

I have a vector like this

var1=c("A","A","B"," "," ","C","A","","A")


How can I create a vector of ids indicating whether they are adjacent. Like

id1=c(1,1,1,0,0,2,2,0,3)


So I want to assign ids to each clusters. Any ways to do that in R?

Answer

We can cumsum on the diff of var1 to generate a sequence representing the clusters including empty strings and then replace empty string positions with 0:

replace(cumsum(c(T, diff(var1 != "") == 1)), var1 == "", 0) 

gives:

# [1] 1 1 1 0 0 2 2 0 3

for:

var1=c("A","A","B","","","C","A","","A")

This assumes var1 does not start with empty string, to generalize it to that case, we can check the first element of var1 and use the condition as the initial value:

replace(cumsum(c(var1[1] != "", diff(var1 != "") == 1)), var1 == "", 0)

gives:

# [1] 0 1 1 1 0 0 2 2 0 3

for:

var1=c("", "A","A","B","","","C","A","","A")