user3570187 - 1 year ago 64
R Question

# Concatenating rows and dropping the successive repetitions or repeating elements

I have a dataframe as follows and I would like to concatenate the rows in the sequence (drop them if there is successive repetition) based on ticket number and identify how they are handed across people.

``````    ticket<- c("1", "1", "1", "2", "2", "2", "2")
name<- c("Olg", "Jan", "Jan", "Olg", "Jan", "Jan","Olg")
df<- data.frame(ticket, name)
``````

I want to create a column called variable called sequence which provides the paths and suppresses the successive repetitions as shown (Olg-Jan-Jan to Olg-Jan and Olg-Jan-Jan-Olg to Olg-Jan-Olg). Any suggestions? Thanks!

``````   seq<- c("Olg-Jan", "Olg-Jan", ""Olg-Jan", "Olg-Jan-Olg","Olg-Jan-Olg","Olg-Jan-Olg" )
``````

Answer Source

`name` is a factor (and we could convert it to factor if it wasn't) so we use the underlying numeric factor codes to check for consecutive duplicates and remove them. We use `dplyr` so that we can easily group by `ticket` and chain functions together using the chaining operator (`%>%`).

``````library(dplyr)

df %>% group_by(ticket) %>%
filter(c(1, diff(as.numeric(name))) !=0) %>%
summarise(sequence = paste(name, collapse="-"))
``````
``````  ticket    sequence
1      1     Olg-Jan
2      2 Olg-Jan-Olg
``````

If you want to keep all the rows of the original data frame and just add the sequence, then you can `left_join` the output above to your original data frame:

``````df = df %>%
left_join(df %>% group_by(ticket) %>%
filter(c(1, diff(as.numeric(name))) !=0) %>%
summarise(sequence = paste(name, collapse="-")))
``````
``````  ticket name    sequence
1      1  Olg     Olg-Jan
2      1  Jan     Olg-Jan
3      1  Jan     Olg-Jan
4      2  Olg Olg-Jan-Olg
5      2  Jan Olg-Jan-Olg
6      2  Jan Olg-Jan-Olg
7      2  Olg Olg-Jan-Olg
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download