screechOwl screechOwl - 1 year ago 132
R Question

R remove special character and repeating underscores

I have a dataset that contains spaces and other punctuation characters. I'm trying to replace the spaces and special characters with "_". This creates spots with multiple "_" strung together, so I'd like to remove these too by using the following function as described here :

removeSpace <- function(x){
class1 <- class(x)
x <- as.character(x)
x <- gsub(" |&|-|/|'|(|)",'_', x) # convert special characters to _
x <- gsub("([_])\\1+","\\1", x) # convert multiple _ to single _

if(class1 == 'character'){
return(x)
}
if(class1 == 'factor'){
return(as.factor(x))
}
}


The issue is instead of removing spaces and replacing with "_" it does every other character with "_" (i.e. "test" -> "t_e_s_t")

What am I doing wrong?

Answer Source

You don't need to run two separate replacements to accomplish this. Just put a + quantifier in your match pattern.

Match: [-/&'() ]+

Replace with: _

Also note that I used a character set instead of switching between each option with |. This is generally a better approach when matching one of multiple individual characters.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download