PKua PKua - 26 days ago 6
C++ Question

Assigning to reference parameter invalidates object

I've faced some misleading issue. I have a function which takes string as a reference, splits it using regex, saves one part back to given string and returns the other. However assigning one match to the reference parameter seems to invalidate match object (looks like kind of misused move assignment), while assigning it to a local variable works fine. Here's the code:

std::string ParseVar(std::string & szExp)
{
static std::regex oRegex("^([[:alpha:]][[:alnum:]_]*)\\s*<-\\s*(.+)$");
std::smatch oMatch;

if (std::regex_match(szExp, oMatch, oRegex))
{
if (oMatch.size() != 3)
throw std::runtime_error("internal error");

std::cout << "Match before: " << oMatch.str(2) << std::endl;
szExp = oMatch.str(2);
std::cout << "Match after: " << oMatch.str(2) << std::endl;
return oMatch.str(1);
}

return "";
}


This prints (for szExp = "foo <- 5+5+5+5"):

Match before: 5+5+5+5
Match after: +5+5+5


Returned value also seems to be broken, however szExp contains proper string.

Changing it to:

std::string ParseVar(std::string & szExp)
{
static std::regex oRegex("^([[:alpha:]][[:alnum:]_]*)\\s*<-\\s*(.+)$");
std::smatch oMatch;
std::string save1, save2;

if (std::regex_match(szExp, oMatch, oRegex))
{
if (oMatch.size() != 3)
throw std::runtime_error("internal error");

save1 = oMatch.str(1);
save2 = oMatch.str(2);

std::cout << "Match before: " << oMatch.str(2) << std::endl;
szExp = save2;
std::cout << "Match after: " << oMatch.str(2) << std::endl;
return save1;
}

return "";
}


Prints the same thing, but at least both returned value and szExp are fine.

What's going on here?

Answer

The std::smatch object is an instantiation of the std::match_results template. Its entry on cppreference contains the following passage:

This is a specialized allocator-aware container. It can only be default created, obtained from std::regex_iterator, or modified by std::regex_search or std::regex_match. Because std::match_results holds std::sub_matches, each of which is a pair of iterators into the original character sequence that was matched, it's undefined behavior to examine std::match_results if the original character sequence was destroyed or iterators to it were invalidated for other reasons.

Because modifying the original string (which is what you're doing by assigning to to szExp) invalidates iterators into its character sequence, you're falling foul of the above and causing undefined behaviour.