themeparkfocus themeparkfocus - 4 months ago 13
MySQL Question

utf8 not showing hyphens correctly in echoed text

my MySQL database is set to utf8_unicode_ci and I have $pdo->exec('SET NAMES "utf8"') as part of the following php code yet when I echo text from the query a hyphen - looks likes this –. What am I doing wrong, why is the hyphen not displaying correctly?

<?php

try
{
$pdo = new PDO('mysql:host=localhost;dbname=danville_tpf', 'danville_dan', 'password');
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$pdo->exec('SET NAMES "utf8"');
}
catch (PDOException $e)
{
$output = 'Unable to connect to the database server.';
include 'output.html.php';
exit();
}


$output = 'Theme Park Database initialized';
//include 'output.html.php';//

try
{
$park_id = $_GET['park_id'];
$query = "SELECT * FROM tpf_parks WHERE park_id = $park_id";
$result = $pdo->query($query);
}
catch (PDOException $e)
{
$output = 'Unable to connect to the database server.';
//include 'output.html.php';//
}

$output = 'Sucessfully pulled park';
//include 'output.html.php';//

foreach ($result as $row)
{
$parkdetails[] = array(
'name' => $row['name'],
'blurb' => $row['blurb'],
'website' => $row['website'],
'address' => $row['address'],
'logo' => $row['logo']
);

}

?>


Please help.

Answer

– is common mojibake for an en dash (), which is a different character from a hyphen.

It is the result of taking the UTF-8–encoded form of the dash (0xe2 0x80 0x93) and incorrectly assuming that it is actually encoded using Windows-1252.

Interpreting those three bytes as Windows-1252: 0xe2, 0x80 and 0x93 separately represent â, and .

Assuming the offending character is in the blurb field, if you query SELECT HEX(blurb) FROM tpf_parks (with a suitable WHERE clause), you will see the hex encoding of the offending bytes.

If you see E28093 in there, then the database value is correctly encoded as UTF-8 and there will be a character encoding mismatch in your client or server configuration.

If, however, you see C3A2E282ACE2809C, then the character has already been encoded incorrectly in the database — i.e. interpreted incorrectly, then saved as the UTF-8 representation of those 3 characters. If this is the case you'll need to update the data to fix the issue. You could do this using iconv:

$fixedData = iconv("utf-8", "windows-1252", $badData);

This will convert the doubly-converted bytes back to the UTF-8 encoding.