Lokesh Lokesh - 1 year ago 79
Java Question

Split string into repeated characters

I want to split the string "aaaabbbccccaaddddcfggghhhh" into "aaaa", "bbb", "cccc". "aa", "dddd", "c", "f" and so on.

I tried this:

String[] arr = "aaaabbbccccaaddddcfggghhhh".split("(.)(?!\\1)");

But this eats away one character, so with the above regular expression I get "aaa" while I want it to be "aaaa" as the first string.

How do I achieve this?

Answer Source

Try this:

String   str = "aaaabbbccccaaddddcfggghhhh";
String[] out = str.split("(?<=(.))(?!\\1)");

=> [aaaa, bbb, cccc, aa, dddd, c, f, ggg, hhhh]

Explanation: we want to split the string at groups of same chars, so we need to find out the "boundary" between each group. I'm using Java's syntax for positive look-behind to pick the previous char and then a negative look-ahead with a back reference to verify that the next char is not the same as the previous one. No characters were actually consumed, because only two look-around assertions were used (that is, the regular expresion is zero-width).

