www\.mysite\.com\/(.*)(\.html) // Does not capture 'www.mysite.com/cat'
www\.mysite\.com\/(.*)(\.html)? // Captures the '.html'
www.mysite.com/aadvark.html (capture group should be 'aadvark')
www.mysite.com/bird.html (capture group should be 'bird')
www.mysite.com/cat (capture group should be 'cat')
A lot of issues like this can be fixed by being more specific with your dot-match-all. If you change your
[^.]* (0+ non-
. characters), you'll get your expected results.
This is because when you make
(\.html) optional, the
.* greedily continues to the end. This could also be fixed by using
? to make your repetition "lazy" (stops as soon as the next part of the expression matches); however, then you'd need to anchor the end of the expression with a
I'd recommend this first. But, the second is more encompassing by matching things like