Jon Jon - 1 month ago 5
Javascript Question

Regex JS - Create Capture Pattern Without Specified Word

I am trying to capture with Javascript regex any string between my domain and

.html
(if present), but am having trouble doing so. Any advice?

Regex:
www\.mysite\.com\/(.*)(\.html) // Does not capture 'www.mysite.com/cat'
www\.mysite\.com\/(.*)(\.html)? // Captures the '.html'

Test Data:
www.mysite.com/aadvark.html (capture group should be 'aadvark')
www.mysite.com/bird.html (capture group should be 'bird')
www.mysite.com/cat (capture group should be 'cat')

Sam Sam
Answer

A lot of issues like this can be fixed by being more specific with your dot-match-all. If you change your .* to [^.]* (0+ non-. characters), you'll get your expected results.

/www\.mysite\.com\/([^.]*)(\.html)?/

This is because when you make (\.html) optional, the .* greedily continues to the end. This could also be fixed by using ? to make your repetition "lazy" (stops as soon as the next part of the expression matches); however, then you'd need to anchor the end of the expression with a $.

/www\.mysite\.com\/(.*?)(\.html)?$/

I'd recommend this first. But, the second is more encompassing by matching things like foo.bar in www.mysite.com/foo.bar.html.