Lokesh Lokesh - 4 months ago 9
Ruby Question

Ruby split string and preserve separator

In Ruby, what's the easiest way to split a string in the following manner?


  • 'abc+def'
    should split to
    ['abc', '+', 'def']

  • 'abc\*def+eee'
    should split to
    ['abc', '\*', 'def', '+', 'eee']

  • 'ab/cd*de+df'
    should split to
    ['ab', '/', 'cd', '\*', 'de', '+', 'df']



The idea is to split the string about these symbols:
['-', '+', '*', '/']
and also save those symbols in the result at appropriate locations.

Answer

Option 1

/\b/ is a word boundary and it has zero-width, so it will not consume any characters

'abc+def'.split(/\b/)
# => ["abc", "+", "def"]

'abc*def+eee'.split(/\b/)
# => ["abc", "*", "def", "+", "eee"]

'ab/cd*de+df'.split(/\b/)
# => ["ab", "/", "cd", "*", "de", "+", "df"]

Option 2

If your string contains other word boundary characters and you only want to split on -, +, *, and /, then you can use capture groups. If a capture group is used, String#split will also include captured strings in the result. (Thanks for pointing this out @Jordan) (@Cary Swoveland sorry, I didn't see your answer when I made this edit)

'abc+def'.split /([+*\/-])/
# => ["abc", "+", "def"]

'abc*def+eee'.split /([+*\/-])/
# => ["abc", "*", "def", "+", "eee"]

'ab/cd*de+df'.split /([+*\/-])/
# => ["ab", "/", "cd", "*", "de", "+", "df"]

Option 3

Lastly, for those using a language that might not support string splitting with a capture group, two lookarounds. Lookarounds are also zero-width matches, so they will not consume any characters

'abc+def'.split /(?=[+*\/-])|(?<=[+*\/-])/
# => ["abc", "+", "def"]

'abc*def+eee'.split /(?=[+*\/-])|(?<=[+*\/-])/
# => ["abc", "*", "def", "+", "eee"]

'ab/cd*de+df'.split /(?=[+*\/-])|(?<=[+*\/-])/
# => ["ab", "/", "cd", "*", "de", "+", "df"]

The idea here is to split on any character that is preceded by one of your separators, or any character that is followed by one of the separators. Let's do a little visual

ab ⍿ / ⍿ cd ⍿ * ⍿ de ⍿ + ⍿ df

The little symbols are either preceded or followed by one of the separators. So this is where the string will get cut.

Comments