bademeister bademeister - 23 days ago 9
C++ Question

Regex backreference not working

I want to match this html-like pattern:

<12>Some content with \n in it<12>


Important is that only complete items are marked (numbers MUST match), means when one tag is missing the content should not be marked.
<12>Some content with \n in it<13>test<13>


This is what I've got so far:

(<\s*[0-9]+\>)(.*?[^<]*?)(<\s*[0-9]+\>)


This is what I expect that it should work but actually it does not:

(<\s*[0-9]+\>)(.*?[^<]*?)(<\s*[0-9]+\>)\1


I tried with this editor but the backreference does not work as I expect. Why does the backreference to the first capture group not work? The solution should work in C++.

http://regexr.com/3ek1a

Answer

Try this:

<\s*(\d+)\s*>(.*?)<\s*\1\s*>

Explanation

< matches the character < literally (case sensitive)
1st Capturing Group  (\d+)
\d+ matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times(greedy)

> matches the character > literally (case sensitive)

.*?
matches any character (except for line terminators)

*? Quantifier — Matches between zero and unlimited times (lazy)

< matches the character < literally (case sensitive)

\1 matches the same text as most recently matched by the 1st capturing group

> matches the character > literally (case sensitive)

C++14 Code Sample:

#include <regex>
#include <string>
#include <iostream>
using namespace std;

int main()
{
    string regx = R"(<\s*(\d+)\s*>(.*?)<\s*\1\s*>)";
    string input = "<1>test1<1><2>Test2<2>sfsaf<3><4>test4<4>";
    smatch matches;
        while (regex_search(input, matches, regex(regx)))
        {
            cout<<matches[2]<<endl;
            input = matches.suffix().str();
        }
    return 0;
}

Run the code here

Comments