mallix mallix - 1 year ago 124
PHP Question

Regular expression - preg_match Latin and Greek characters

I am trying to create a regular expression for any given string.

Goal: remove ALL characters which are not "latin" or "lowercase greek" or "numbers" .

What I have done so far:


This works perfect for latin characters.

When I try this:
no luck. Works BUT leaves out any other symbol like !!#$%@%#$@,`

My knowledge is limited when it comes to regexp. Any help would be much appreciated!


Posted below is the function that matches characters specified and creates a slug out of it, with a dash as a separation character:

$q_separator = preg_quote('-');
$trans = array(
'&.+?;' => '',
'[^a-z0-9 -]' => '',
'\s+' => $separator,
'('.$q_separator.')+' => $separator

$str = strip_tags($str);

foreach ($trans as $key => $val){
$str = preg_replace("#".$key."#i", $val, $str);

if ($lowercase === TRUE){
$str = strtolower($str);

return trim($str, '-');

So if the string is: OnCE upon a tIME !#% @$$ in MEXIco

Using the function the output will be: once-upon-a-time-in-mexico

This works fine but I want the preg_match also to exclude greek characters.

Answer Source

Ok, can this replace your function?

$subject = 'OnCEΨΩ é-+@àupon</span> aαθ tIME !#%@$ in MEXIco in the year 1874 <or 1875';

function format($str, $excludeRE = '/[^a-z0-9]+/u', $separator = '-') {
    $str = strip_tags($str);
    $str = strtolower($str);
    $str = preg_replace($excludeRE, $separator, $str);
    $str = trim($str, $separator);
    return $str;
echo format($subject);

Note that you will loose all characters after a < (cause of strip_tags) until you meet a >

/ / //Old answer when i believed you want to preserve greek characters

You can specify a range such as α-ω or any strange characters you want! The reason your pattern doesn't work is that you don't precise to the regex engine to treat your string as a unicode string. To do that, you must add the u modifier at the end of the pattern. Like that:


You can use chars hexadecimal code too:


Note that if you are sure not to have or want to preserve, uppercase Greek chars in your string, you can use the character class \p{Greek} like this :


(It is a little longer but more explicit)

/ / //Old answer end