genonymous genonymous - 16 days ago 4
Java Question

Why is this regex not matching URLs?

I have the following regex:

^(?=\w+)(-\w+)(?!\.)


Which I'm attempting to match against the following text:

www-test1.examples.com


The regex should match only the
-test1
part of the string and only if it is before the first
.
and after the start of the expression.
www
can be any string but it should not be matched.

My pattern is not matching the
-test1
part. What am I missing?

Answer

Java is one of the only languages that support non-fixed-length look-behinds (which basically means you can use quantifiers), so you can technically use the following:

(?<=^\w+)(-\w+)

This will match for -test without capturing the preceding stuff. However, it's generally not advisable to use non-fixed-length look-behinds, as they are not perfect, nor are they very efficient, nor are they portable across other languages. Having said that.. this is a simple pattern, so if you don't care about portability, sure, go for it.

The better solution though is to group what you want to capture, and reference the captured group (in this case, group 1):

^\w+(-\w+)

p.s. - \w will not match a dot, so no need to look ahead for it.

p.p.s. - to answer your question about why your original pattern ^(?=\w+)(-\w+)(?!\.) doesn't match. There are 2 reasons:

1) you start out with a start of string assertion, and then use a lookahead to see if what follows is one or more word chars. But lookaheads are zero-width assertions, meaning no characters are actually consumed in the match, so the pointer doesn't move forward to the next chars after the match. So it sees that "www" matches it, and moves on to the next part of the pattern, but the actual pointer hasn't moved past the start of string. So, it next tries to match your (-\w+) part. Well your string doesn't start with "-" so the pattern fails.

2) (?!\.) is a negative lookahead. Well your example string shows a dot as the very next thing after your "-test" part. So even if #1 didn't fail it, this would fail it.

Comments