Maliks Maliks - 11 months ago 71
PHP Question

Using mb_substr still breaks accent character at the end

Logic: I am getting username from DB and if it is greater than 30 in length then i show 30 characters with "..." appended at the end.
Code is

$username = htmlspecialchars($username);
if(mb_strlen($username, 'utf-8')>30){
$username_trimmed = mb_substr($username, 0, 30, 'utf-8').'...';

and in my navivation I am just printing this

<class="userName">Hello, <?php echo $username_trimmed; ?>

My encoding in set as
, and
extension is enabled in php.

Output of above code : It still breaks the accent character
because it is multi-byte character and it is getting cut the in the middle.
Actual word is
and output is:

Erroneous output

Question what am I missing?
should not consider it as a single character and should not stop it from breaking in the middle as it does?

Answer Source

Your string is actually "&Eacute;", not "É". mb_substr handles your characters just fine, it does not handle HTML entities. Don't store HTML entities in your database, store actual Unicode characters. At the very least, decode from HTML entities to actual characters using html_entity_decode($str, ENT_COMPAT, 'UTF-8') before applying mb_substr (and then apply htmlspecialchars again afterwards to preserve HTML syntax).