Chazy Chaz Chazy Chaz - 7 months ago 152
PHP Question

mysqli not setting charset to utf8mb4

Problem found

The problem seems to be that

$mysqli->set_charset()
is not accepting `utf8mb4' as a valid encoding (just as I "speculated" in the first update). MySQL version is 5.5.41 and PHP version is 5.4.41 (no problem with that).




Sorry for the title, I've been searching/reading about what/where can the problem be and I'm already too confused about this...

I recently started using utf8mb4 in mysql. I'm using utf8mb4 as charset and utf8mb4_unicode_ci as collation for all tables/columns.

So I first I changed:

$mysqli->set_charset('utf8');


to

$mysqli->set_charset('utf8mb4');


made sure my php files are utf8 (I'm using Visual Studio Code so the files are created in UTF-8 by default), and php/html headers are set to utf8:

index.php

header('Content-type: Text/HTML; Charset=UTF-8');


main.php (included at the end of index.php)

<meta http-equiv="Content-Type" content="Text/HTML" />
<meta charset="UTF-8" />


The problem is that for some tables I have to manually insert the data, and this data is stored as is: with special characters, with accents, ñ, etc... And when I display this data in my website I can see that these characters
have replaced the special/accented characters.

So my question is: is there any way to store data as is (without replacing/converting special/accented characters) in mysql and be able to display it fine (as is)?

If I revert to
$mysqli->set_charset('utf8');
the data is displayed fine... So this keeps me wondering that there should be no problem with storing utf-8 characters as they are and there is some codification problem somewhere...

I'm using sqlyog community (with wine) and I read somewhere that sometimes the gui does not work correctly when you change some db/table configuration and the only way is the old way (running yourself the query), but I didn't tried this yet. I ran queries to set the charset and collation of all tables/columns.

What do you think?

UPDATE

I'm starting to think that mysqli does not accept utf8mb4 as a valid character encoding and uses utf8 from php and not from mysql... I also think mysql fckd up creating utf8mb4 instead of updating the existing utf8 to support 4 bytes....

As I'm testing with mysqli charset utf8, everything is stored as is and displayed as is (with mysql charset and collation set to utf8mb4...).

UPDATE 2

SELECT name, HEX(name) FROM person LIMIT 1


This is what it outputs:

New Person has name Altaïr 416C7461C3AF72


But as I already said, this is using:

$mysqli->set_charset('utf8');


to insert and to select. If I use utf8mb4 instead this is what it gets stored:

Altaïr


But it's displayed ok. What it's not displayed ok is if the name is stored as is, the displayed name will be
Alta�r
.

So the question is: Why is mysqli/mysql storing
ï
as
ï
using utf8mb4? And why is php displaying special characters like
ï
as
when utf8mb4 is set in mysqli?

Can someone please confirm that
mysqli::set_charset
accepts utf8mb4 as a valid encoding?

UPDATE 3

I have a class function that selects a string from a table "es", for example:
Iniciar Sesión
(this is what's stored) and if mysqli charset is utf8, what is being selected/displayed is
Iniciar Sesión
.

This maybe a completely different problem but it's clearly another codification problem. From my understanding, if the tables/columns are utf8mb4 and mysqli is set to utf8, mysql has to encode from utf8 (3bytes) to ut8mb4 (full byte support). So this means that
mysqli
does not use utf8 from php but from mysql. Is this correct, right?

My application is currently having a rough time with encodings... (but maybe is some server configuration problem...)

UPDATE 4

Can the problem be here? I really have no idea about this kind of configurations:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | latin1_swedish_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

Answer

The problem might stem from the fact that you're not using utf8mb4 in your MySQL column definition (at least you did not say what encoding you're using).

Here is an example of a MySQL table definition with a column that uses utfmb4:

CREATE TABLE `person` (
  `name` varchar(255) CHARACTER SET utf8mb4
)

UPDATE

Using the following table definition:

CREATE TABLE `person` (
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

and the following PHP script:

<?php
$mysqli = new mysqli('localhost', 'username', 'password', 'database');
$mysqli->set_charset('utf8mb4');

$mysqli->query("INSERT INTO `person` VALUES ('Altaïr Ibn-La\'Ahad')");

$result = $mysqli->query("SELECT * FROM `person` LIMIT 1");

$person = $result->fetch_object();

if($person)
    printf ("New Person has name %s.\n", $person->name);

$result->close();
$mysqli->close();

when I insert "Altaïr Ibn-La'Ahad" into the database, the name is stored as is without changes. The script also prints the name without changes: "New Person has name Altaïr Ibn-La'Ahad."

I hope this helps you resolve your problem. Let me know if it does or doesn't.