Maximilian Schirm Maximilian Schirm - 2 months ago 12
Java Question

Issues with the Java 8 linebreak RegEx escape character in Eclipse

I want to parse a String formatted like stated below with a regular expression in my method, however even though online RegEx tools like RegExr show that my expression should match it doesn't.

The expression I'm using is

(@(\\d+))[(\r\n)\n](((0|1){"+width+"}[(\r\n)\n]){"+height+"})
, where
width
and
height
are integer values for the required width and height of the text blocks.

The text blocks I want to retrieve from my file are formatted as follows:

@200
0000000000
0000011001
1100100000
0101001101
1110001110

@500
0000000000
0000011001
1100100000
0101001101
1110001110

etc.


(Here,
width
would be 10 and
height
5)

Wanted to use the Matcher.find() method to retrieve each of those blocks, but the expression won't even find anything.

I suspect there is a problem with the way I'm handling line breaks, but when I want to try and use the new Java 8 \R universal linebreak escape character Eclipse shows the error "Invalid escape sequence".

Answer

[..] is single character class. So it can match single character in described range. This means that

  • [(\r\n)\n] represents single ( or \r or \n or ) (another \n is redundant)

  • since \R beside single \r or \n (and few others) it also represents set of two characters \r\n so you can't use it inside [..] since it may match only single character.

What you can do is use \R instead of [(\r\n)\n] (but don't forget to also escape its \ part in String like you did for \d). You can also remove most outer (...) pair since entire match is already stored in group 0, so you don't need to add another group for that purpose.

So one of simplest ways to rewrite your regex would be:

String regex = "@(\\d+)\\R([01]{"+width+"}\\R){"+height+"}";

But since you may not want to include last line separator feel free to make last \R optional with ? quantifier and reluctant by adding another ? after it like

String regex = "@(\\d+)\\R([01]{"+width+"}\\R??){"+height+"}";

DEMO