Sowmya S. Manian Sowmya S. Manian - 2 months ago 16
R Question

How to print/see predefined patterns [:alnum:], [:punct:], [:digit:], [:blank:] etc. in R Regular Expressions

Where I can see codes for the predefined patterns for Regular Expression in R? The documentation says it is related to

locales/POSIX locale
.

> [[:alpha:]]
> [:alpha:]


Does not print anything. How to look for predefined patterns and the functions for how many times it should match etc.

Any help is highly appreciated.

Answer

First we read help("regex"):

[:lower:]
Lower-case letters in the current locale.

Similar for [:upper:] and [:alpha:] is just the union of them.

Then we can check the current locale's character set:

Sys.getlocale("LC_CTYPE")
#[1] "German_Germany.1252"

l10n_info()
#$MBCS
#[1] FALSE
#
#$`UTF-8`
#[1] FALSE
#
#$`Latin-1`
#[1] TRUE
#
#$codepage
#[1] 1252

Then we can go to the internet and e.g. to Wikipedia.

Then we can try this:

gsub("[^[:alpha:]]", "", rawToChar(as.raw(1:(16^2-1))))
#[1] "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
gsub("[^[:cntrl:]]", "", rawToChar(as.raw(1:(16^2-1))))
#[1] "\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ"