user3354212 - 4 months ago 13

R Question

I have a dataframe:

`df = read.table(text="ID location C1 C2 C3 C4 C5 C6`

M01 1 A H H A A B

M02 2 A H A A A B

M03 3 A B A A A B

M04 4 H B H A A B

M05 5 H B H A A B

M06 6 A B H A A H

M07 7 A B H B A H

M08 8 A B H A A H

M09 9 A B H A A H

M10 10 B B H A A H

M11 11 A B H A A H

M12 12 A B H A A H

M13 13 A B H A A H

M14 14 B B B A A H

M15 15 B B B A A A", header=T, stringsAsFactors=F)

I would like to find the locations of crossovers or breaks (the junctions between different letters for each column). for example, for column

`C1`

`A`

`H`

`C1`

`C6`

`$C1`

3 5 9 10 13

$C2

2

$C3

1 3 13

$C4

6 7

$C5

$C6

5 14

Thanks for helps.

Answer

We can loop over the 'C' columns with `lapply`

and compare the adjacent elements to find the index

```
lapply(df[-(1:2)], function(x) which(x[-1]!= x[-length(x)]))
#$C1
#[1] 3 5 9 10 13
#$C2
#[1] 2
#$C3
#[1] 1 3 13
#$C4
#[1] 6 7
#$C5
#integer(0)
#$C6
#[1] 5 14
```

Or we can apply the `run-length-encoding`

function i.e. `rle`

, extract the `lengths`

, get the cumulative sum and remove the last element.

```
lapply(df[-(1:2)], function(x) head(cumsum(rle(x)$lengths),-1))
```