chiborg chiborg - 6 months ago 23
PHP Question

Can parse_url ever detect a malformed URL when combined with a regular expression?

Consider the following code that liberally tries to detect possible URLs (anything that looks vaguely like a domain name due to combined dots and character) and tries to parse it:

if ( preg_match( '/[a-z\.0-9]+\.[a-z]{2,6}/i', $text, $possibleUrl ) ) {
$urlResult = parse_url( 'http://' . $possibleUrl[0] );
echo $urlResult === false ? 'malformed URL' : 'parseable URL';

Is it possible to give that code a input value for
that will produce the output
malformed URL


TL;DR: No.

Long answer: parse_url (see in the C source code of php: ext/standard/url.c php_url_parse_ex() function) does not check validity of any input between the scheme (i.e. http:// here) and a subsequent @, : or /; it just assumes it's the host part. [Note: for the @, it considers the part after it the host.]

Your regex only allows characters [a-zA-Z0-9.], thus it will be recognized as host part in any case.