sarasreddy74 sarasreddy74 - 3 months ago 7
R Question

How do I match this pattern in R

I have to match only the first Country name in the pattern below. The country names are given in all upper case letters. I used the following code to get the matches but it matches all the countries.

'\\b[A-Z]{2,}.\\b'


Eg: In the pattern below, I just want UNITED KINGDOM

x = "~ London, Greater London ~ UNITED KINGDOM;~ Ottawa, Ontario ~ CANADA;~,~ AUSTRALIA;~,~ POLAND;~,~ USA"

Answer

This seems to work:

regmatches(x, regexpr('\\b[A-Z ]{2,}\\b', x))
# [1] "UNITED KINGDOM"

I just added a space to make the character set [A-Z ]. Note that regexpr gets the first match while gregexpr gets all of them (similar to sub vs gsub).

For more info, I recommend the official docs at ?regexpr. Or you could try the user-written "docs" currently being put together here.