DCRubyHound DCRubyHound - 1 month ago 17
R Question

remove duplicate entries in cell - R

I searched high and low on here, as well as tried duplicate and unique functions for what I'm about to ask, but couldn't get anything to work. Let's say I have a data frame named company with a variable state. When I collapse the rows I'm left with this output in one of the state variable observations:

PA;PA;PA;TX;TX


How could I remove the dups inside the cell (and entire vector for that matter), so it looks as follows:

PA;TX


I have no problems removing dup rows, but can't seem to do it for the cells themselves.

Answer

This works for a single string:

x <- "PA;PA;PA;TX;TX"

x2 <- strsplit(x, ";")

x3 <- unlist(x2)

x4 <- unique(x3)

x5 <- paste(x4, collapse = ";")

If you want to do it for the whole vector company$state, you could roll all that up into one call to sapply:

sapply(company$state, function(x) paste(unique(unlist(strsplit(x, ";"))), collapse = ";"))
Comments