Jmini Jmini - 4 months ago 22
Java Question

RegEx to split camelCase or TitleCase (advanced)

I found a brilliant RegEx to extract the part of a camelCase or TitleCase expression.

(?<!^)(?=[A-Z])


It works as expected:


  • value -> value

  • camelValue -> camel / Value

  • TitleValue -> Title / Value



For example with Java:



String s = "loremIpsum";
words = s.split("(?<!^)(?=[A-Z])");
//words equals words = new String[]{"lorem","Ipsum"}


My problem is that it does not work in some cases:


  • Case 1: VALUE -> V / A / L / U / E

  • Case 2: eclipseRCPExt -> eclipse / R / C / P / Ext



To my mind, the result shoud be:


  • Case 1: VALUE

  • Case 2: eclipse / RCP / Ext



In other words, given n uppercase chars:


  • if the n chars are followed by lower case chars, the groups should be: (n-1 chars) / (n-th char + lower chars)

  • if the n chars are at the end, the group should be: (n chars).



Any idea on how to improve this regex?

NPE NPE
Answer

The following regex works for all of the above examples:

public static void main(String[] args)
{
    for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {
        System.out.println(w);
    }
}   

It works by forcing the negative lookbehind to not only ignore matches at the start of the string, but to also ignore matches where a capital letter is preceded by another capital letter. This handles cases like "VALUE".

The first part of the regex on its own fails on "eclipseRCPExt" by failing to split between "RPC" and "Ext". This is the purpose of the second clause: (?<!^)(?=[A-Z][a-z]. This clause allows a split before every capital letter that is followed by a lowercase letter, except at the start of the string.