user2393256 user2393256 - 1 month ago 9
C++ Question

Regex that matches a string that ends with the same sequence as it begins

I have a string that contains a number of unique sequences that always start and end with an underscore. I am looking for a regex that returns the part of the string between these sequences. I tried to make a capture group for everything between the first two underscores, then there are some characters in between and at the end, i try to match the first capture group. But it does not match anything:

std::string s = "somerandomstuff_UNIQUESEQUENCE_somemorethings_UNIQUESEQUENCE_morewords"
std::regex seq("_(.*)_.*_$1_", std::regex_constants::extended);
std::smatch m{};
std::regex_search(s, m, seq);


The problem is that I do not know what the sequences are, i only know that they start and end with an underscore (otherwise this would be fairly easy to solve...). Does somebody know a regex for this?

Answer

Your problem is that .* is greedy - so it matches UNIQUESEQUENCE_somemorethings_UNIQUESEQUENCE, and then the whole regex fails to match.

The solution is actually quite simple. You know that UNIQUESEQUENCE ends at the first _, so don't match it. Use a regex of:

_([^_]*)_.*_$1_
Comments