Pavel Němec Pavel Němec - 1 year ago 50
PHP Question

Wrong unicode diacritics in Wordpress links

I made mirror copy of my website which runs on Wordpress by export and import database by PhpMyAdmin. Filenames are in Czech language and it doesn't show images which name contains some diacritics in file name. For example word called "hruška" should be tranlaslated into hru%C5%A1ka (C5A1 is unicode code for 'š') but it's actually translated into hrus%CC%8Cka. CC8C is unicode code for char 'ˇ' - the symbol above s- which means it's like "hrusˇka" instead of "hruška". What I made wrong and how can I fix it?

Answer Source

I wrote it wrong. I actaully needed reverse solution: get char with combining caron instead of one char with normal caron. Finally I solved it by this function:

function to_combining_caron($html){
   $replace_ar = array("č" => "č", "š" => "š", "ě" => "ě","ř" => "ř","ž" => "ž","ň" => "ň");;  
   foreach($replace_ar as $original => $replace){
      $html = str_replace($original,$replace,$html);    
   return $html;

add_filter('wp_get_attachment_url', 'replace_caron');

Note: Here it's seems like it replace same chars, but second group of chars is actaully chars with combining caron. If it helps someone just copy the code above and your text editor should deal with it.