view raw
Mike Mike - 8 months ago 53
R Question

Dealing with Spaces and NA's when Uniting Multiple Columns with Tidyr

So using the simple dataframe below, I want to create a new column that has all the days for each person, separated by a semi-colon.

For example, using Doug, it should look like - Monday; Wednesday; Friday

I would like to use Tidyr's Unite function for this but when I use it, I get - Monday;;Wednesday;;Friday, because of the NA's, which also could be blank spaces as well. Sometimes there are semi-colons at the beginning and end as well. So I'm hoping there's a way to keep using "unite" but enhanced with a regular expression so that I end up with each day of the week separated by one semi-colon, and no semi-colons at the beginning or end.

I would also like to stick with Tidyr, Dplyr, Stringr, etc.

Monday<-c("Monday"," "," ","Monday","Monday")
Tuesday<-c(" ","Tuesday","Tuesday"," ","Tuesday")
Wednesday<-c(" ","Wednesday","Wednesday","Wednesday"," ")
Thursday<-c(" "," "," "," ","Thursday")
Friday<-c(" "," "," "," ","Friday")


Days<-Days%>%unite(BestDays,Monday,Tuesday,Wednesday,Thursday,Friday,sep="; ",remove=FALSE)


From getAnywhere(""), unite is calling"paste", c(data[from], list(sep = sep))) underhood, and paste as far as I know doesn't provide a functionality to omit NAs unless manually implemented in some way;

Nevertheless, you can use a regular expression method as follows with gsub from base R to clean up the result column:

gsub("^\\s;\\s|;\\s{2}", "", Days$BestDays)
# [1] "Monday"                            "Tuesday; Wednesday"               
# [3] "Tuesday; Wednesday"                "Monday; Wednesday"                
# [5] "Monday; Tuesday; Thursday; Friday"

This removes either ^\\s;\\s pattern or ;\\s{2} pattern, the former handle the case when the string starts with space string where we can just remove the space and it's following ;\\s, otherwise remove ;\\s{2} which can handle cases where \\s are both in the middle of the string and at the end of the string.