Ugi Ugi - 2 months ago 26
C++ Question

different results when using g++ and visual studio 14 2015 compilers with regex in c++

I was playing with regex in c++ when I noticed some irregularities between the g++ (MinGW) and the Visual Studio 14 2015 compiler (both on Windows). Here is the code I tried it with:

#include <iostream>
#include <vector>
#include <string>
#include <regex>

static const std::string data = "\n a = 10\n b = 20\n";

int main(int argc, char* argv[])
{
auto strIt = data.begin();

while (strIt != data.end())
{
std::regex e("^[ \t\n\r]");
std::smatch m;
std::string s(strIt, data.end());

if (std::regex_search(s, m, e))
{
strIt += m[0].str().size();
}
else
{
std::cout << "s = \"" << s << "\"" << '\n';
break;
}
}
}


When compiling with g++ I get the expected output of

s = "a = 10\n b = 20\n"


but when using the visual studio compiler, it spits out

s = "b = 20\n"


ignoring the whole "a = 10" part. After investigating further in visual studio via the debug functionality, I saw that the m variable was holding the space from after the "a = 10" part.

Do you know why it behaves like that? Am I making a big mistake somewhere while not noticing it? Please help.

Answer Source

First, a simplified example:

#include <iostream>
#include <string>
#include <regex>

using namespace std;

int main() {
    const string data = "abc\nXabc";
    regex re("^X");
    smatch match;
    if (regex_search(data, match, re))
        cout << "match: " << match.str() << endl;
    else
        cout << "no match" << endl;
    return 0;
}

Visual Studio 2015 outputs:

match: X

MinGW 7.1.0 outputs:

no match


So, the difference cuts down to whether ^ in the regular expression matches starts of lines or only the beginning of the string. In C++ 17 it is determined by the regex::flag_type argument passed to the regex constructor.

31.5.1 Bitmask type syntax_­option_­type:

The type syntax_­option_­type is an implementation-defined bitmask type. Setting its elements has the effects listed in Table 130. A valid value of type syntax_­option_­type shall have at most one of the grammar elements ECMAScript, basic, extended, awk, grep, egrep, set. If no grammar element is set, the default grammar is ECMAScript.


Table 130 — syntax_­option_­type effects

...

multiline — Specifies that ^ shall match the beginning of a line and $ shall match the end of a line, if the ECMAScript engine is selected.

In order for ^ to match starts of lines, the regex object needs to initialized like this:

regex re("^X", regex_constants::multiline);

In conclusion, MinGW's behavior is correct under the C++ 17 Standard.