Narek Atayan Narek Atayan - 2 months ago 19
C++ Question

Split string with specific constraint on delimiter

Suppose we have a string:

"((0.2,0), (1.5,0)) A1 ABC p"
. I want to split it into logical units like this:

((0.2,0), (1.5,0))
A1
ABC
p


I.e. split string by whitespaces with requirement that previous character isn't a comma.
Is it possible to use
regex
as solution?

Update: I've tried in this way:

#include <iostream>
#include <string>
#include <regex>

int main()
{
std::string s = "((0.2,0), (1.5,0)) A1 ABC p";
std::regex re("[^, ]*\\(, *[^, ]*\\)*"); // as suggested in the updated answers
std::sregex_token_iterator
p(s.begin(), s.end(), re, -1);
std::sregex_token_iterator end;
while (p != end)
std::cout << *p++ << std::endl;
}


The result was:
((0.2,0), (1.5,0)) A1 ABC p


Solution:

#include <iostream>
#include <string>
#include <regex>

int main() {

std::string s = "((0.2,0), (1.5,0)) A1 ABC p";

std::regex re("[^, ]*(, *[^, ]*)*");
std::regex_token_iterator<std::string::iterator> p(s.begin(), s.end(), re);
std::regex_token_iterator<std::string::iterator> end;
while (p != end)
std::cout << *p++ << std::endl;
}


Output:

((0.2,0), (1.5,0))

A1

ABC

p

Rob Rob
Answer

you can do it like this:

 [^, ]*(, *[^, ]*)*

what does this do?

first lets go over basics of regular expressions:

the [] defines a group of characters that you want to match for example [ab] will match an 'a' or 'b'.

If you use [^] syntax that describes all the characters you do NOT want to match so [^ab] will match anything that is NOT and 'a' or a 'b'.

the * symbol tell the regular expression that the previous match can appear zero or more times. so a* will match the empty string '' or 'a' or 'aaa' or 'aaaaaaaaaaaaa'

When you put () around a part of an expression that creates a group that you can then so interesting things with in our case we used it so that we could define a part of the pattern that we wanted to be optional by putting * next to it so that it could appear zero or more times.

Ok putting all together:

The fist part [^ ,]* says: Match zero or more character that are NOT ' ' or ',' this wil match string like 'A1' or '((0.2"

The second part in ()* is used to continue matching string that have ',' and space in them but that you do not want to split, this part is optional so that it correctly matches 'A1' or 'ABC' or 'p'.

So (, *[^, ]*)* will match zero or more strings that start with ',' and any number of ' ' followed by a string that does not have ',' or ' ' in it. So in your example it would match ",0)" which is the continuation of "((0.2" and also match ", (1.5" and again ",0))" which will all get added together to make "((0.2,0), (1.5,0))"

NOTE: You may need to escape some characters in your expression based on the regular expression library you are using. The solution will work in this online tester http://www.regexpal.com/

but some libraries and tools need you to escape things like the (

so the expression would look like:

 [^, ]*\(, *[^, ]*\)*

Also I removed the ( |$) part is it is only required if you want the ending space to be part of the match.