skeggse skeggse - 4 months ago 17
Javascript Question

Javascript greedy regex appears non-greedy

I'd like to match a three-part string. The first part consists of one or more

a
characters, the second part consists of one or more
b
characters, and the third part consists either of zero or more
c
characters or zero or more
C
characters, but not a mix of
c
and
C
.

As such, I wrote the following regular expression:

/a+b+(C*|c*)/


And immediately noticed that it fails to greedily match the trailing
c
s in the following string:

aaaaabbcc


Wrapping the inner clauses of the or clause does not fix the unexpected behavior:

/a+b+((C*)|(c*))/


But interestingly both regular expressions match the following, where the
C
characters match the first clause of the or:

aaaaabbCC


The following regular expression captures the semantics accurately, but I'd like to understand why the original regular expression behaves unexpectedly.

/a+b+(([Cc])\2*)?/

Answer

I think you want

/a+b+(C+|c+)?/

That is, if it finds a C it will match as many more C as possible, if it finds a c it will match as many more c as possible. But finding C or c is optional.

Your regex doesn't work because first it tries C*, which matches the empty string, which is OK. Then it won't try to check if c* can match more characters.