bonaca bonaca - 3 years ago 142
PHP Question

preg_match with excluding comma and point

if (!preg_match('/[^a-z ćčžđš\0-9]/i', $_POST['a'])) {
echo 'error';

I use this for allowing only alphanumeric characters plus specific local characters for croatian language.

It works but it also allows comma, point... and maybe something else.
How to exclude any other characters except a-z, my local characters, spaces and numbers?

for example - should not be allowed:

  • abc,

  • abc.

  • abc+

and should be allowed:

  • abc

Answer Source

There are 3 problems in your code:

1) To check if your string contains forbidden characters and display the error message, you use a double negation:

if (!preg_match('/[^allowed characters]/i', ...
#   ^--------------^

This means "if the string doesn't contain a forbidden character, then display 'error'".

In my opinion the correct algorithm is more:

if ( preg_match('/[^allowed characters]/i', ...

2) If you escape the 0 in 0-9 inside a character class, you define a range between the null character and the character 9 (see the ascii table)

3) You are dealing with unicode characters, you have to use the u modifier, otherwise the regex engine will read your string byte by byte and return false positive:

if ( preg_match('/[^a-z ćčžđš0-9]/iu', $_POST['a']) )
    echo 'error!';

To finish, accented characters can be written in two ways in unicode for example č can be:

  • the single codepoint U+010D (LATIN SMALL LETTER C WITH CARON)
  • the combination of the codepoint U+0063 (LATIN SMALL LETTER C) and the codepoint U+030C (COMBINING CARON)

Your pattern will not handle the second case. To avoid the problem, you have to normalize your string first with the intl normalizer.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download