Ruby Question

Ruby string char chunking

I have a string "wwwggfffw" and want to break it up into an array as follows:

["www", "gg", "fff", "w"]

Is there a way to do this with regex?

Answer Source

scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).

EDIT to explain the regexp ((.)\2*) itself:

(   start group #1, consisting of
(     start group #2, consisting of
.       any one character
)       and nothing else
\2    followed by the content of the group #2
*       repeated any number of times (including zero)
)     and nothing else.

So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.

