Ghanshyam K Dobariya Ghanshyam K Dobariya - 19 days ago 14
PHP Question

PHP : Japanese character validation : Why Hiragana characters validated against regular expression of Katakana characters?

I want to validate whether user's input is in Full width Katakana characters or not ?

Here is a set of japanese characters according to categories

http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml

Now look at below code, I am trying to validate, different inputs

$pattern
contains all full width katakana characters

header('Content-Type: text/html; charset=utf-8');
$pattern = "/^([゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ]+)$/";
$values = array("ナカ" ,
"ね",
"PHP",
"ナカPHP",
);

foreach ($values as $value){
echo $value. " => ";
if(preg_match($pattern, $value)){
echo "valid";
}else{
echo "invalid";
}
echo "<br>";
}


1st value in
$values
array is valid full width katakana, 2nd is Hiragana, and 3rd and 4th are invalid entries.

I am getting following output.

ナカ => valid
ね => valid
PHP => invalid
ナカPHP => invalid


Concern is why Hiragana characters validated, this issue exists against many Hiragana characters while need is only Full width KataKana

Thanks in Advance.

Answer

Like i said in my comment, you need to turn on the unicode modifier u. And note that when you're dealing with unicode characters, you must need to enable unicode modifier u.

$pattern = "/^([゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ]+)$/u";
                                                                                                                                                           ^
Comments