I am trying to use the stringr library to extract emails from a big, messy file.
str_match doesn't allow perl=TRUE, and I can't figure out the escape characters to get it to work.
Can someone recommend a relatively robust regex that would work in the context below?
c("firstname.lastname@example.org", "email@example.com", "firstname.lastname@example.org")->emails
> "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex > str_match(emails, regex) [,1] [1,] "email@example.com" [2,] "firstname.lastname@example.org" [3,] "email@example.com"
The @-sign is not in need of escaping in regex. And "." and "-" are not special in character classes. If you want to add a requirement for ".com",".co", ".edu", ".org" then you should specify how complete that list needs to be.
As pointed out by M42, this is not a surefire method. In fact it is claimed that there is no sure-fire method: Using a regular expression to validate an email address