danno57 danno57 - 1 month ago 17
Java Question

regex optional capture not working as expected

I have need to capture all elements of a string like this

front stuff grp2="abc" middle stuff grp4="xyz" end stuff


such that it is broken into these five groups

#1: front stuff
#2: grp2="abc"
#3: middle stuff
#4: grp4="xyz"
#5: end stuff


This expression does the trick as long as all five sections exist

([\s\S]*?)(grp2=\"\S*?\")([\s\S]*?)(grp4=\"\S*?\")([\s\S]*)


But if grp4="..." doesn't exist, for example,

front stuff grp2="abc" end stuff


it of course doesn't match at all.

So okay, I can make the 4th group optional like this, right?

([\s\S]*?)(grp2=\"\S*?\")([\s\S]*?)(grp4=\"\S*?\")?([\s\S]*)


Apparently wrong. What that produces is this (when grp4 is present)

#1: front stuff
#2: grp2="abc"
#3:
#4:
#5: middle stuff grp4="xyz" end stuff


The 4th group is no longer matched even when it exists.

FWIW, I need all the text (all groups must be capturing groups) because I'm ultimately using this to manipulate the text of groups 2 and 4 (if they exist), and reconstitute the string. Like taking that example string and turning it into this

front stuff grp2="123" middle stuff grp4="456" end stuff


The behavior is easy to see on regex101.com. I've tried every combination of "optional" I know of. I'm sure I must be doing something dumb, and I've wasted enough time trying to figure it out, so I'm finally asking for help.

Thanks!

Answer

You could make the middle stuff and grp4 in the middle optional since both have end stuff. Your new regex would be ([\\s\\S]*?)(grp2=\"\\S*?\")(?:([\\s\\S]*?)(grp4=\"\\S*?\")){0,1}([\\s\\S]*)

String test = "front stuff grp2=\"abc\" middle stuff grp4=\"xyz\" end stuff";
Pattern p = Pattern.compile("([\\s\\S]*?)(grp2=\"\\S*?\")(?:([\\s\\S]*?)(grp4=\"\\S*?\")){0,1}([\\s\\S]*)");
Matcher m = p.matcher(test);

for(int i=1; i<=m.groupCount(); i++) {
    if(m.group(i)!=null) {
        System.out.println(i+": "+m.group(i));
    }
}
// String test = "front stuff grp2=\"abc\" middle stuff grp4=\"xyz\" end stuff";
// 1: front stuff 
// 2: grp2="abc"
// 3:  middle stuff 
// 4: grp4="xyz"
// 5:  end stuff

// String test = "front stuff grp2=\"abc\" end stuff";
// 1: front stuff 
// 2: grp2="abc"
// 3:  end stuff