user3354212 user3354212 - 1 year ago 41
R Question

find out locations of breaks (crossovers) from a dataframe in r

I have a dataframe:

df = read.table(text="ID location C1 C2 C3 C4 C5 C6
M01 1 A H H A A B
M02 2 A H A A A B
M03 3 A B A A A B
M04 4 H B H A A B
M05 5 H B H A A B
M06 6 A B H A A H
M07 7 A B H B A H
M08 8 A B H A A H
M09 9 A B H A A H
M10 10 B B H A A H
M11 11 A B H A A H
M12 12 A B H A A H
M13 13 A B H A A H
M14 14 B B B A A H
M15 15 B B B A A A", header=T, stringsAsFactors=F)

I would like to find the locations of crossovers or breaks (the junctions between different letters for each column). for example, for column
the first junction should be row 3 and row 4. From row 1 to row 3, they are all
. row 4 is
. So the location of this crossover is 3. The expected result is a list of columns from

3 5 9 10 13
1 3 13
6 7

5 14

Thanks for helps.

Answer Source

We can loop over the 'C' columns with lapply and compare the adjacent elements to find the index

lapply(df[-(1:2)], function(x) which(x[-1]!= x[-length(x)]))
#[1]  3  5  9 10 13

#[1] 2

#[1]  1  3 13

#[1] 6 7


#[1]  5 14

Or we can apply the run-length-encoding function i.e. rle, extract the lengths, get the cumulative sum and remove the last element.

lapply(df[-(1:2)], function(x) head(cumsum(rle(x)$lengths),-1))