SPlatten SPlatten - 4 months ago 21
C++ Question

C++ regular expression to split string into an array

I am trying to write a handler to extract parameters from a function, where the parameters are between () and the parameters will be delimited by a command ',' parameters may also be defined as arrays which are comma delimited and wrapped in [].

Examples of what I'm trying to decode:

testA(aaaa, [bbbb,cccc,dddd], eeee)


or

testB([aaaa,bbbb,cccc], dddd, [eeee,ffff])


Basically any combination and any number of parameters, what I want from these would be a list containing:

for testA:

0 : aaaa
1 : [bbbb,cccc,dddd]
2 : eeee


for testB:

0 : [aaaa,bbbb,cccc]
1 : dddd
2 : [eeee,ffff]


I'm trying to write a parser that will give me the same, but a regular expression to do this would be preferred.

This is my coded solution which works written in C++ for Qt5.6:

int intOpSB, intPStart;
//Analyse and count the parameters
intOpSB = intPStart = 0;
for( int p=0; p<strParameters.length(); p++ ) {
const QChar qc = strParameters.at(p);

if ( qc == clsXMLnode::mcucOpenSquareBracket ) {
intOpSB++;
continue;
} else if ( qc == clsXMLnode::mcucCloseSquareBracket ) {
intOpSB--;
continue;
}
if ( (intOpSB == 0 && qc == clsXMLnode::mcucArrayDelimiter)
|| p == strParameters.length() - 1 ) {
if ( strParameters.at(intPStart) == clsXMLnode::mcucArrayDelimiter ) {
//Skip over the opening bracket or array delimiter
intPStart++;
}
if ( intPStart > p ) {
continue;
}
int intEnd = p;
while( true ) {
if ( intEnd > 0 && (strParameters.at(intEnd) == clsXMLnode::mcucArrayDelimiter) ) {
//We don't want the delimiter or the closing square bracket in the parameter
intEnd--;
} else {
break;
}
}
if ( intEnd > intPStart ) {
QString strParameter = strParameters.mid(intPStart, intEnd - intPStart + 1);
//Update remaining parameters, skipping the parameter and any delimiter
strParameters = strParameters.mid(strParameter.length() + 1);
//Remove any quotes
strParameter = strParameter.replace("\"", "");
strParameter = strParameter.replace("\'", "");
//Add the parameter
mslstParameters.append(strParameter);
//Reset parameter start
intPStart = 0;
p = -1;
}
}
}


References:

mcucOpenSquareBracket is a constant defined as '['
mcucCloseSquareBracket is a constant defined as ']'
mcucArrayDelimiter is a constant defined as ','
mslstParameters is a member defined as QStringList

Answer
auto term = "(?:[^,<]*)"s;
auto chain = "(?:(?:"+term+",)*"+term+")"s;

auto clause = "(?:(?:"+term+")|(?:<" + chain + ">))"s;

auto re_str = "^(?:("+term+")|(?:<("+chain+")>))" "(?:|,((?:"+clause+",)*"+clause+"))";

re_str takes your string, and splits off the first term or chain from the tail.

It returns up to 3 sub-matches. The first is a lone term. The second is a comma-delimited chain of terms. The third is the rest of the string after the ,.

The tail is going to be empty, or another string that can be parsed using the above regular expression.

Chains of terms can be parsed by the same regular expression.

live example.

I matched <> delimited chains of terms, not [], because I got bored of \\s.

You also want to discard whitespace around clauses. I omitted that, it should be easy to stitch in.

Comments