Phoebe Phoebe - 2 months ago 7
R Question

Returning non-alphanumeric characters found by REGEX

I have no problem finding and returning words containing non-alphanumeric characters, but what I'd like to do is return the non-alphanumeric character that was found. For example:

a <- c("hello?", "goodbye","hi!")
grep("[^[:alnum:]]", a, value=TRUE)


Returns:

[1] "hello?" "hi!"


But what I'd like to return is:

[1] "?" "!"


Any thoughts? Thanks!

EDIT: I love this...two user responses, four different ways to get it done. I've learned a lot. Thank you!

Answer

We can use gsub to remove the alphanumeric characters by matching the pattern ([^[:punct:]]+ - meaning one or more non punctuation characters) and replace it with blanks (""). We remove the blanks either with nzchar or setdiff.

setdiff(gsub("[^[:punct:]]+", "", a), "")
#[1] "?" "!"

Or another option is str_extract from stringr

library(stringr)
as.vector(na.omit(str_extract(a, "[[:punct:]]+")))
#[1] "?" "!"
Comments