Gags Gags - 2 months ago 10
PHP Question

Generate permalink to a blog post Hindi PHP

I have one form in which following inputs are taken from user:


  • Blog title

  • Blog Description

  • Permalink to access blog



I am converting Blog title to lower case and replacing white spaces with dash(-) and storing it in
Permalink to access blog
.
Below is the code to handle this operation:

setlocale(LC_ALL, 'en_US.UTF8');

function toAscii($str, $replace=array(), $delimiter='-') {
if( !empty($replace) ) {
$str = str_replace((array)$replace, ' ', $str);
}
$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
$clean = strtolower(trim($clean, '-'));
$clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);
return $clean;
}

$prmlkn = toAscii($blog_headline, $replace=array(), $delimiter='-');


This code all works fine till
Blog headline
is in English. But if user types in
Hindi
then i am only getting
-
as permalink means it is not recognizing Hindi POST values.

Answer

This happens because Hindi uses the extended character set in UTF-8 and you are converting to ASCII that only provides latin characters, thus:

$str = "नमस्ते"
$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str); // clean is an empty string ""

According to rfc3986

  1. Characters

...

The ABNF notation defines its terminal values to be non-negative
integers (codepoints) based on the US-ASCII coded character set
[ASCII]. Because a URI is a sequence of characters, we must invert
that relation in order to understand the URI syntax. Therefore, the

integer values used by the ABNF must be mapped back to their
corresponding characters via US-ASCII in order to complete the syntax rules.

A URI is composed from a limited set of characters consisting of
digits, letters, and a few graphic symbols. A reserved subset of
those characters may be used to delimit syntax components within a
URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each
component's identifying data.

You might be better off using urlencode() but note this might make a really ugly and long permalink

$str = "नमस्ते hello";
$clean = urlencode("$str");
printf("%s",$clean);

would result in a valid but ulgy:

%E0%A4%A8%E0%A4%AE%E0%A4%B8%E0%A5%8D%E0%A4%A4%E0%A5%87+hello
Comments