user2533660 user2533660 - 15 days ago 6
Java Question

Regex pattern discriminating between letters when it shouldn't?

I'm writing a regex for a simple username validation for practice. While I am sure there may be other issues with this pattern, I would like it if someone could explain this seemingly odd behavior I am getting.

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class userRegex{
public static void main(String[] args){
Scanner in = new Scanner(System.in);
int testCases = Integer.parseInt(in.nextLine());
while(testCases>0){
String username = in.nextLine();
String pattern = "([[:alpha:]])[a-zA-Z_]{7,29}";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(username);

if (m.find( )) {
System.out.println("Valid");
} else {
System.out.println("Invalid");
}
testCases--;
}
}
}


When I input:

2
dfhidbuffon
dfdidbuffon


the compiler should return:

Valid
Valid


but instead, it returns

Valid
Invalid


Why does it discriminate between the difference of the 3rd letter being "h" or "d" in each of the usernames?

Edit: Added @Draco18s and @ruakh 's suggestions, however, I am still getting the same strange behaviour.

Answer

[:alpha:] means "any of the characters :, a, h, l, p". So dfhidbuffon contains a match for your pattern (namely h plus idbuffon), whereas dfdidbuffon does not. (Note that matcher.find() looks for any match within the string; if you want to specifically match the entire string, you should use matcher.matches(), or you can modify your pattern to use anchors such as ^ and $.)

You may be thinking of the notation found in many regex implementations whereby [:alpha:] means "any alphabetic character"; but firstly, Java's Pattern class doesn't support that notation (hat-tip to ajb for pointing this out), and secondly, those languages would require [:alpha:] to appear inside a character class, e.g. as [[:alpha:]]. The Java equivalent would be \p{Alpha} or [A-Za-z] if you only want to match ASCII letters, and \p{IsAlphabetic} if you want to match any Unicode letter.

Comments