jczch jczch - 6 months ago 19
Java Question

Why does the pattern ignore the space inside character class

I am trying to match some codes that are short strings with simple structure:

  • 5 digits

  • Colon

  • Some letters

  • Space or underscore

  • Some digits.

I want to use a
option to format my pattern:

String pat = "(?x) ([0-9]{5}) : ([a-zA-Z]+ [_ ] [0-9]+) ";

This pattern works fine at https://regex101.com/r/oW8vQ4/1.

However, in Java, this line:

"31500:STR 200".matches(pat)

yields false.

Why does it return false here? Shouldn't the
[_ ]
match the space even if the
is enabled as it is inside a character class?


I think the problem is that you need to scape the space inside the character classes. From http://www.regular-expressions.info/freespacing.html

Java, however, does not treat a character class as a single token in free-spacing mode. Java does ignore whitespace and comments inside character classes. So in Java's free-spacing mode, [abc] is identical to [ a b c ]. To add a space to a character class, you'll have to escape it with a backslash. But even in free-spacing mode, the negating caret must appear immediately after the opening bracket. [ ^ a b c ] matches any of the four characters ^, a, b or c just like [abc^] would. With the negating caret in the proper place, [^ a b c ] matches any character that is not a, b or c.

Give it a try with the pattern - just added \\ before the space... but didn't test this myself.

String pat = "(?x) ([0-9]{5}) : ([a-zA-Z]+ [_\\ ] [0-9]+) ";