Geo Geo - 5 months ago 16
Ruby Question

How can I tokenize this with a regex?

Suppose I have strings like the following :

OneTwo
ThreeFour
AnotherString
DVDPlayer
CDPlayer


I know how to tokenize the camel-case ones, except the "DVDPlayer" and "CDPlayer". I know I could tokenize them manually, but maybe you can show me a regex that can handle all the cases?

EDIT:
the expected tokens are :

OneTwo -> One Two
...
CDPlayer -> CD Player
DVDPlayer -> DVD Player

Answer

Look at my answer on the question, .NET - How can you split a “caps” delimited string into an array?.

The regex looks like this:

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

It can be modified slightly to allow searching for camel-cased tokens, by replacing the $ with \b:

/([A-Z]+(?=\b|[A-Z][a-z])|[A-Z]?[a-z]+)/g
Comments