genonymous genonymous - 1 month ago 16
Java Question

Why is this regex not matching URLs?

I have the following regex:


Which I'm attempting to match against the following text:

The regex should match only the
part of the string and only if it is before the first
and after the start of the expression.
can be any string but it should not be matched.

My pattern is not matching the
part. What am I missing?


Java is one of the only languages that support non-fixed-length look-behinds (which basically means you can use quantifiers), so you can technically use the following:


This will match for -test without capturing the preceding stuff. However, it's generally not advisable to use non-fixed-length look-behinds, as they are not perfect, nor are they very efficient, nor are they portable across other languages. Having said that.. this is a simple pattern, so if you don't care about portability, sure, go for it.

The better solution though is to group what you want to capture, and reference the captured group (in this case, group 1):


p.s. - \w will not match a dot, so no need to look ahead for it.

p.p.s. - to answer your question about why your original pattern ^(?=\w+)(-\w+)(?!\.) doesn't match. There are 2 reasons:

1) you start out with a start of string assertion, and then use a lookahead to see if what follows is one or more word chars. But lookaheads are zero-width assertions, meaning no characters are actually consumed in the match, so the pointer doesn't move forward to the next chars after the match. So it sees that "www" matches it, and moves on to the next part of the pattern, but the actual pointer hasn't moved past the start of string. So, it next tries to match your (-\w+) part. Well your string doesn't start with "-" so the pattern fails.

2) (?!\.) is a negative lookahead. Well your example string shows a dot as the very next thing after your "-test" part. So even if #1 didn't fail it, this would fail it.