Phoebe Phoebe - 3 months ago 13
R Question

Returning non-alphanumeric characters found by REGEX

I have no problem finding and returning words containing non-alphanumeric characters, but what I'd like to do is return the non-alphanumeric character that was found. For example:

a <- c("hello?", "goodbye","hi!")
grep("[^[:alnum:]]", a, value=TRUE)


[1] "hello?" "hi!"

But what I'd like to return is:

[1] "?" "!"

Any thoughts? Thanks!

EDIT: I love this...two user responses, four different ways to get it done. I've learned a lot. Thank you!


We can use gsub to remove the alphanumeric characters by matching the pattern ([^[:punct:]]+ - meaning one or more non punctuation characters) and replace it with blanks (""). We remove the blanks either with nzchar or setdiff.

setdiff(gsub("[^[:punct:]]+", "", a), "")
#[1] "?" "!"

Or another option is str_extract from stringr

as.vector(na.omit(str_extract(a, "[[:punct:]]+")))
#[1] "?" "!"