Stefano Bragaglia Stefano Bragaglia - 5 months ago 9
Java Question

Matching a phone number at the end of a string with `regex`, and return both parts

I have a bunch of lines like the following:

Name1 Surname1 +44 (020) 1234 5678
Name2 Name2 Surname2 +39 (051) 12.34.56
Surname3, Name3 - (555) 123-456-789
Surname4, Name4 Name4 123 - 456.78.90


and I would like to identify and return the names and the numbers that they contain. For instance, I would like to return:


  1. Name1 Surname1 +44 (020) 1234 5678



    • name:
      Name1 Surname1

    • number:
      +44 (020) 1234 5678


  2. Name2 Name2 Surname2 +39 (051) 12.34.56



    • name:
      Name2 Name2 Surname2

    • number:
      +39 (051) 12.34.56


  3. Surname3, Name3 - (555) 123-456-789



    • name:
      Surname3, Name3 -

    • number:
      (555) 123-456-789


  4. Surname4, Name4 Name4 123 - 456.78.90



    • name:
      Surname4, Name4 Name4

    • number:
      123 - 456.78.90




I'm using Java
regex
and, so far, I came up to the following pattern:

\A(.*)\s+(\+?\s*\d+([.-\s]*(\d+|\(\d+\)))+)\z


If
line
is any of above lines, the code to match the pattern is:

Pattern pattern = Pattern.compile("^(.*)\\s+(\\+?\\s*\\d+([.-\\s]*(\\d+|\\(\\d+\\)))+)$");
Matcher matcher = pattern.match(line);
if (matcher.find()) {
System.out.println("Name: " + pattern.group(1));
System.out.println("Number: " + pattern.group(2));
}


Unfortunately, on any
line
(
Name1 Surname1 +44 (020) 1234 5678
, for instance) it returns the following:

Name: Name1 Surname1 +44 (020) 1234
Number: 5678


I think that the reason for this result is the
regex
being too greedy, but I don't understand how to modify its behaviour.

Can anyone please correct the pattern and explain me the solution in simple terms? I read a few tutorial without understanding what to do. Thanks in advance!

Answer

The simplest I can think of right now would be

^(.*?)\s*((?:\+|\()[-\d(). ]*)

It captures everything up to the spaces preceding a + or a (. Then it captures everything after that (being digits, hyphens, parentheses, dots or spaces) to a second group.

Check it out here at regex101.