Dennis Dennis - 4 months ago 47
PHP Question

PHP - Searching a string for phone numbers and emails

i am trying to write a small script to find out whether a given string contains a phone numer and / or email address.

Here is what i have so far:

function findContactInfo($str) {
// Find possible email
$pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.[a-z]{2,3}?/i';
$emailresult = preg_match($pattern, $privateMessageText);

// Find possible phone number
preg_match_all('/[0-9]{3}[\-][0-9]{6}|[0-9]{3}[\s][0-9]{6}|[0-9]{3}[\s][0-9]{3}[\s][0-9]{4}|[0-9]{9}|[0-9]{3}[\-][0-9]{3}[\-][0-9]{4}/', $text,
$matches);
$matches = $matches[0];
}


The part with the emails works fine but i am open to improvements.
With the phone number i have some problems. First of all, the strings that will be given to the function will most likely contain german phone numbers. The problem with that are all the different formats. It could be something like
030 - 1234567 or 030/1234567 or 02964-723689 or 01718290918
and so on. So basically it is almost impossible for me to find out what combination will be used. So what i was thinking was, maybe it would be a good idea to just find a combination of a minimum of three digits following each other. Example:

$stringOne = "My name is John and my phone number is 040-3627781";
// would be found

$stringTwo "My name is Becky and my phone number is 0 4 0 3 2 0 5 4 3";
// would not be found


The problem i have with that is that i don't know how to find such combinations. Even after almost an hour of searching through the web i can't find a solution.
Does anyone have a suggestion on how to approach this?
Thanks!

Jan Jan
Answer Source

You could use

\b\d[- /\d]*\d\b

See a demo on regex101.com.


Long version:

\b\d      # this requires a "word boundary" and a digit
[- /\d]*  # one of the characters in the class
\d\b      # a digit and another boundary.


In PHP:

<?php
$regex = '~\b\d[- /\d]*\d\b~';

preg_match_all($regex, $your_string_here, $numbers);
print_r($numbers);
?>

Problem with this is, that you might get a lot of false positives, so it will certainly improve your accuracy when these matches are cleaned, normalized and then tested against a database.


As for your email question, I often use:

\S+@\S+
# not a whitespace, at least once
# @
# same as above

There are dozens of different valid emails, the only way to prove if there's an actual human being behind one is to send an email with a link (even this could be automated).