Alex Alex - 2 months ago 12
Javascript Question

JS string splitting with regex

I am trying to split an expression on some specific chars. I know we can use String.split() with regex, so this was my first guess:



function expressionSplit([input]) {
let regex = /([ (),;.]+)/g;
let arr = input.split(regex);

arr.forEach(item => console.log('item: ' + item));
}

expressionSplit(['let sum = 1 + 2;if(sum > 2){\tconsole.log(sum);}']);





Now this is nowhere near what I expected, so I did some more reading and found that people, unlike me, use split() with regex without problems. Puzzled, I tried this:



function expressionSplit([input]) {
let regex = /([ (),;.]+)/g;
let arr = input.replace(regex, '|').split('|');

arr.forEach(item => console.log('item: ' + item));
}

expressionSplit(['let sum = 1 + 2;if(sum > 2){\tconsole.log(sum);}']);





And contrary to my expectations - it worked, mostly. Why does this happen? I expect it's some sort of JS-typical oddness, because it simply makes no sense to me, plus, as I said - other people seem to use split() with regex without problem. Also How can I split by '\t' (tab). Adding '\t' to regex seems to do nothing and '\\t' only matches 't'. Thanks.

Answer

There's no "JS-typical oddness" going on here -- this is all documented behavior. If you want to complain about JS oddness, you're a little late to the party...that went out of style years ago as JavaScript "grew up."

From the documentation of String#split:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

Because you're using a grouping operator, you're getting the splitting tokens in your result as well as the content being split. If you remove the splitting tokens, it behaves as you originally expected it to:

// old:     /([ (),;.]+)/g;
let regex = /[ (),;.]+/g;