Rob Evans Rob Evans - 2 months ago 6
MySQL Question

MySQL Unicode Data

I have data in a table that looks like this (based on SQLYog):

(1) µéÁÂÓ ·Óᡧ

But when the forum system that is reading the data shows it on screen it looks like this:

(2) ต้มยำ ทำแกง

The second output is the correct one (Thai language).

I'm writing a script that is going to pull all this data and import it into a new database (MongoDB) but when I pull the data and echo to the browser I get the output like the first one (1) above.

How do I go about converting this so that when I insert it (or output it to a browser) it is saved and displayed correctly like (2)?

I haven't been able to output the text like (2) but I WAS able to get the output to look like (1) by including in my html:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

And then when echoing the data doing:

echo iconv('latin1', 'utf-8', $string);

I'm sure it's something really simple but I'm not familiar enough with unicode etc to work this out! Thanks dudes!


I'm now once step closer. I called:

mysql_query("SET NAMES 'utf8'");

And was then able to output (1) using just:

echo $string;

So I guess that MySQL is now converting latin1 to utf8 for me over the connection instead of me having to do this in PHP via iconv.

Still can't do Thai character output to the browser though!


I managed to solve this.

The text I was getting from the database was windows-874 (the codepage for Thai). After I googled the Thai codepage that put me on the correct path for converting to utf-8. Once I switched the header to:

header('Content-type: text/html; charset=windows-874');

I was able to see the Thai characters correctly so I then disabled the header again and used:

iconv('windows-874', 'UTF-8', $string);

This converted the windows-874 to utf-8 and the page still displayed correctly even without the header or meta tag.

So... a lesson for character set newbies - find out what codepage your text is likely to be encoded with and then try a conversion from that to utf-8 :)