Rebse Rebse - 25 days ago 7
Java Question

^[A-Z](([A-Z_0-9])*[^_])?$ wrong match

Need a regex for Java generic type parameters, so i've tried with :


^[A-Z](([A-Z_0-9])*[^_])?$


means the type name should have 1 or more characters all uppercase and digits, it's possible

to use '_' as separator, but not at the end, f.e. 'TT_A9'

But to my surprise my regex tool shows a match for 'Aa' or 'AAa' or 'AA-'

I wrote a simple test class to check :


import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTestPatternMatcher {

public static final String test = "AA-";

public static void main(String[] args) {
Pattern pattern = Pattern.compile("^[A-Z](([A-Z_0-9])*[^_])?$");
Matcher matcher = pattern.matcher(test);
System.out.println("Matches ? " + matcher.matches());
}
}


Output :


AA- Matches ? true


It's also true for AAa, but not for AA_

It works if i use the regex
^[A-Z](([A-Z_0-9])*[^_a-z-])?$


but i don't understand why i need to use 'a-z' and '-' as exclusion,

when i'm only looking for uppercase characters !?

Answer

When using a negated character class - as in your original pattern, [^_] - you tell the regex to consume a character other than the one defined in the class. So, your regex actually requires at least 2 chars, the first one being an uppercase ASCII letter, and any char but _ at the end, and there can be any characters in the _, 0-9 and A-Z ranges in between.

You are looking for a negative lookbehind anchored at the end of the string:

^[A-Z][A-Z_0-9]*$(?<!_)
                 ^^^^^^

See the regex demo

It will fail all matches where the _ is at the end of the string. The _ is not consumed, it is only checked for presence, and thus the pattern will accept (match) a 1-char string starting with an uppercase ASCII letter and optionally followed with the characters from the ranges defined in the [A-Z_0-9] character class.

I also suggest removing all redundant groupings (you are not using the captured subtexts anyway).