Blaise Blaise - 1 month ago 6
Javascript Question

Split a string by whitespace, keeping quoted segments, allowing escaped quotes

I currently have this regular expression to split strings by all whitespace, unless it's in a quoted segment:

keywords = 'pop rock "hard rock"';
keywords = keywords.match(/\w+|"[^"]+"/g);
console.log(keywords); // [pop, rock, "hard rock"]


However, I also want it to be possible to have quotes in keywords, like this:

keywords = 'pop rock "hard rock" "\"dream\" pop"';


This should return

[pop, rock, "hard rock", "\"dream\" pop"]


What's the easiest way to achieve this?

Answer

You can change your regex to:

keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g);

Instead of [^"]+ you've got (?:\\"|[^"])+ which should be self-explanatory - allow \" or other character, but not an unescaped quote.

One important note is that if you want the string to include a literal slash, it should be:

keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes.

Also, there's a slight inconsistency between \w+ and [^"]+ - for example, it will match the word "ab*d", but not ab*d (without quotes). Consider using [^"\s]+ instead, that will match non-spaces.

Comments