Sowmya S. Manian Sowmya S. Manian - 9 months ago 45
R Question

How to print/see predefined patterns [:alnum:], [:punct:], [:digit:], [:blank:] etc. in R Regular Expressions

Where I can see codes for the predefined patterns for Regular Expression in R? The documentation says it is related to

locales/POSIX locale

> [[:alpha:]]
> [:alpha:]

Does not print anything. How to look for predefined patterns and the functions for how many times it should match etc.

Any help is highly appreciated.


First we read help("regex"):

Lower-case letters in the current locale.

Similar for [:upper:] and [:alpha:] is just the union of them.

Then we can check the current locale's character set:

#[1] "German_Germany.1252"

#[1] FALSE
#[1] FALSE
#[1] TRUE
#[1] 1252

Then we can go to the internet and e.g. to Wikipedia.

Then we can try this:

gsub("[^[:alpha:]]", "", rawToChar(as.raw(1:(16^2-1))))
#[1] "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ"
gsub("[^[:cntrl:]]", "", rawToChar(as.raw(1:(16^2-1))))
#[1] "\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ"