aProgger aProgger - 1 month ago 4
Java Question

Why does my regex not work in Java

I have to match custom (German) address strings to get the street, housenumber, zipcode and city. I have a regex for it which works with RegExr and Java Visual Regex Tester.

This is the regex (delivered but editable):


This is the string:

NEUE BÜHNE Senftenberg, Theaterpassage 1, 01968 Senftenberg

This is my code:

String regex = "^([^0-9]+)([0-9]+\\.*?)?(?:\\w)?([0-9]{5})(?:\\w)?(\\.*)$";
String address = "NEUE BÜHNE Senftenberg, Theaterpassage 1, 01968 Senftenberg";
Pattern pattern = Pattern.compile(regex);
String[] addrFromRegex;

// gives an array (length 1) with [0] == address
addrFromRegex = address.split(regex);

// gives an array (length 1) with [0] == address
addrFromRegex = pattern.split(address);

As for split(), the problem may be the faulty escaping. But for pattern I thought I do not have to care about this. What am I doing wrong?


The , in the string is not always given. Other possible address strings are:

NEUE BÜHNE Senftenberg; Theaterpassage 1; 01968 Senftenberg
NEUE BÜHNE Senftenberg Theaterpassage 1 01968 Senftenberg
NEUE BÜHNE Senftenberg|Theaterpassage|1|01968|Senftenberg
NEUE BÜHNE Senftenberg|Theaterpassage_1_01968_Senftenberg

I get the addresses via XML and I do not have any influence on the data provided. By the way the address provided here is an example for a faulty one. I have to deal with those too.


The main point is that your pattern is meant to match the strings you have. So, instead of split, you need to use Pattern#matches() and collect the captured values into a list/array/etc.

The fixed regex is


enter image description here


  • ^ - start of string (not necessary in matches()) -([^0-9]+?) - Group 1: one or more chars other than digits but as few as possible
  • \\s* - 0+ whitespaces
  • ([0-9]+) - Group 2 capturing 1+ digits
  • [\\W_]+ - 1 or more chars that are either non-word or _
  • ([0-9]{5}) - Group 3 capturing 5 digits
  • \\s* - zero or more whitespaces
  • (.*) - Group 4 capturing the rest of the line
  • $ - end of string (not necessary in matches()).

Java demo:

List<String> lst = new ArrayList<>();
String s = "NEUE BÜHNE Senftenberg, Theaterpassage 1, 01968 Senftenberg";
Pattern pattern = Pattern.compile("([^0-9]+?)\\s*([0-9]+)[\\W_]+([0-9]{5})\\s*(.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()){
System.out.println(lst); // => [NEUE BÜHNE Senftenberg, Theaterpassage, 1, 01968, Senftenberg]