PHPglue PHPglue - 8 months ago 51
PHP Question

PHP preg_match() PCRE logic issue?

Consider the following:

$lat = '89° 5'; // works
if(preg_match('/^(([0-8]\d|\d)°?(\s?([0-5]\d|\d))?)(N|S)?$/', $lat, $la)){
$ck = 'DD° MM format --> ';
$test = 'invalid $lat format';
$test = $ck.$la[0];
echo $test;

$lat = '89°5'
everything works fine too. What I'm trying to understand is why
$lat = '89 5'
fails? Maybe my brain isn't working, but it seems that last one should not be an invalid format because of
. Thanks for helping me understand.


Use /(*UTF8)^(([0-8]\d|\d)°?(\s?([0-5]\d|\d))?)(N|S)?$/


In order process UTF-8 strings, you must build PCRE's 8-bit library with UTF support, and, in addition, you must call pcre_compile() with the PCRE_UTF8 option flag, or the pattern must start with the sequence (*UTF8) or (*UTF). When either of these is the case, both the pattern and any subject strings that are matched against it are treated as UTF-8 strings instead of strings of individual 1-byte characters.

So the PCRE engine was still seeing ° as two separate characters, and only making the second half optional.

Note: Interestingly, I was able to get the expected results only using the (lowercase) u modifer on my install.

Note 2: My original comment had two options, don't use the other one as it breaks the test that currently works for you.