sarasreddy74 sarasreddy74 - 2 months ago 5x
R Question

How do I match this pattern in R

I have to match only the first Country name in the pattern below. The country names are given in all upper case letters. I used the following code to get the matches but it matches all the countries.


Eg: In the pattern below, I just want UNITED KINGDOM

x = "~ London, Greater London ~ UNITED KINGDOM;~ Ottawa, Ontario ~ CANADA;~,~ AUSTRALIA;~,~ POLAND;~,~ USA"


This seems to work:

regmatches(x, regexpr('\\b[A-Z ]{2,}\\b', x))

I just added a space to make the character set [A-Z ]. Note that regexpr gets the first match while gregexpr gets all of them (similar to sub vs gsub).

For more info, I recommend the official docs at ?regexpr. Or you could try the user-written "docs" currently being put together here.