Juliana Benitez - 1 year ago 97
R Question

# Removing Adjacent Duplicates in Data Frame in R

This is my first question here at Stack Overflow.
I have a data frame in R with over a hundred columns that is supposed to have duplicates. I can't use

`unique()`
because I only want to remove row-adjacent duplicates in each column.

``````L = list(c("AL", "AL", "AI", "AH", "BK", "CD", "CE", "BT", "BP",
"BD", "BI", "AL"), c("AL", "AL", "AI", "AH", "BK", "AU", "BK",
"CD", "V", "CE", "CE"), c("AL", "AL", "AI", "AH", "AU", "BK",
"BQ"))
do.call(cbind, lapply(L, `length<-`, max(lengths(L))))

song 1  song 2  song 3
AL  AL  AL
AL  AL  AL
AI  AI  AI
AH  AH  AH
BK  BK  AU
CD  AU  BK
CE  BK  BQ
BT  CD
BP  V
BD  CE
BI  CE
AL

song 1  song 2  song 3
AL  AL  AL
AI  AI  AI
AH  AH  AH
BK  BK  AU
CD  AU  BK
CE  BK  BQ
BT  CD
BP  V
BD  CE
BI
AL
``````

I've seen previous answers that seems to work just fine for a single column.

The solution was

``````df = df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
``````

I've seen
`rle`
solutions, but they don't work.
Considering that the columns in my data frame have different lengths,
I would like to know if there is a way to loop through all the columns.

Let's say you have a list like this:

``````songs
# \$song_1
# [1] "AL" "AL" "AI" "AH" "BK" "CD" "CE" "BT" "BP" "BD" "BI" "AL"
#
# \$song_2
# [1] "AL" "AL" "AI" "AH" "BK" "AU" "BK" "CD" "V"  "CE" "CE"
#
# \$song_3
# [1] "AL" "AL" "AI" "AH" "AU" "BK" "BQ"
``````

Shared reproducibly with `dput`:

``````songs = structure(list(song_1 = c("AL", "AL", "AI", "AH", "BK", "CD",
"CE", "BT", "BP", "BD", "BI", "AL"), song_2 = c("AL", "AL", "AI",
"AH", "BK", "AU", "BK", "CD", "V", "CE", "CE"), song_3 = c("AL",
"AL", "AI", "AH", "AU", "BK", "BQ")), .Names = c("song_1", "song_2",
"song_3"))
``````

You can de-dupe adjacent elements in a single list item similarly to the data frame method you have in your question.

``````with(songs, song_1[song_1[-1] != song_1[-length(song_1)]])
# [1] "AL" "AI" "AH" "BK" "CD" "CE" "BT" "BP" "BD" "BI"
``````

To do this to all items in the list, we use `lapply` with an anonymous function:

``````lapply(songs, function(s) s[s[-1] != s[-length(s)]])
# \$song_1
# [1] "AL" "AI" "AH" "BK" "CD" "CE" "BT" "BP" "BD" "BI"
#
# \$song_2
# [1] "AL" "AI" "AH" "BK" "AU" "BK" "CD" "V"
#
# \$song_3
# [1] "AL" "AI" "AH" "AU" "BK"
``````

You can, of course, assign the results of `lapply` to a new object to to overwrite the existing object.

Note that your data took a fair bit of work to get into R because of how you posted it. Next time, please use `dput()` or share code to create simulated data.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download