estani estani - 2 months ago 13
Java Question

regex non-capturing and capturing groups and unexplained consumption

I can't follow why these two expressions are different:

^(\d+)(?:\.(\d+))?(?:\.(\d+))?$
applied to 1.0.3
group 1 =>1
group 2 =>0
group 3 =>3


which is expected. but if I try to generalize it for any series of
\d\.
then it doesn't work anymore:

^(\d+)(?:\.(\d+))+$
applied to 1.0.3
group 1 =>1
group 2 =>3


And strange enough to me everything but the first and last entries disappears:

^(\d+)(?:\.(\d+))+$
applied to 1.2.3.4.5.6.7.8.9
group 1 =>1
group 2 =>9


Not exactly what I was expecting

Answer

Try the following regex, that captures all the numbers with at least one digit and place them to separated groups:

(?<=^|\.)(\d+)(?=\.|$)

It works simple. Let's describe the parts of the regex:

  • (?<=^|\.) is the positive lookbehind checking if the number \d+ follows the dot . character (note is has to be escaped \., otherwise it means any character) or the start of a line ^.
  • \d+ is a number to be captured
  • (?=\.|$) is the positive lookahead checking if the number \d+ is followed by the dot . or the end of a line $.

Try it out at Regex101 where is the more detailed explanation.