user3354212 user3354212 - 3 months ago 7
R Question

find out locations of breaks (crossovers) from a dataframe in r

I have a dataframe:

df = read.table(text="ID location C1 C2 C3 C4 C5 C6
M01 1 A H H A A B
M02 2 A H A A A B
M03 3 A B A A A B
M04 4 H B H A A B
M05 5 H B H A A B
M06 6 A B H A A H
M07 7 A B H B A H
M08 8 A B H A A H
M09 9 A B H A A H
M10 10 B B H A A H
M11 11 A B H A A H
M12 12 A B H A A H
M13 13 A B H A A H
M14 14 B B B A A H
M15 15 B B B A A A", header=T, stringsAsFactors=F)


I would like to find the locations of crossovers or breaks (the junctions between different letters for each column). for example, for column
C1
the first junction should be row 3 and row 4. From row 1 to row 3, they are all
A
. row 4 is
H
. So the location of this crossover is 3. The expected result is a list of columns from
C1
to
C6
.

$C1
3 5 9 10 13
$C2
2
$C3
1 3 13
$C4
6 7
$C5

$C6
5 14


Thanks for helps.

Answer

We can loop over the 'C' columns with lapply and compare the adjacent elements to find the index

lapply(df[-(1:2)], function(x) which(x[-1]!= x[-length(x)]))
#$C1
#[1]  3  5  9 10 13

#$C2
#[1] 2

#$C3
#[1]  1  3 13

#$C4
#[1] 6 7

#$C5
#integer(0)

#$C6
#[1]  5 14

Or we can apply the run-length-encoding function i.e. rle, extract the lengths, get the cumulative sum and remove the last element.

lapply(df[-(1:2)], function(x) head(cumsum(rle(x)$lengths),-1))
Comments