Kokox Kokox - 3 years ago 225
PHP Question

PHP - Latin Characters from Mysql

I'm having a rather strange problem:
I am working with a database (which I did not design), this database is multi-lingual, that is, there are titles in English, Spanish, Russian, Vietnamese, etc.

From what I have seen titles with characters of type "ñ", "á", "é", "ë", have been stored in the database in this way: "&ntilde_", so I know in html for show these characters how to write them is "

ñ
" in my PHP code at the time of calling these characters (without using any type of conversion) the following happens to me:

Title in database: Se&ntilde_ora // Señora
Title obtained by PHP: Señ_ora // Señora


I tried using utf_decode and html_entities_decode but this did not work.
I wanted to do this, use a str_replace to remove the "_" from the title "Señ_ora" but I got: "Se&ntildeora"

Answer Source

characters of type "ñ", "á", "é", "ë", have been stored in the database in this way: "&ntilde_"

This is bizarre.

First of all, make sure your database actually contains these _ characters, and make sure you're not seeing some sort of substitution character being rendered. Whatever program you're using to show the data might have some character set option set incorrectly.

You might say SELECT field, HEX(field) FROM table WHERE field LIKE '%' ORDER BY CHAR_LENGTH(field) LIMIT 10 to find a few relatively short examples. Then pore over the hex output looking for 3B (hex for ;) and 5F (hex for _).

For example, SELECT HEX('Señora'), HEX('Se&ntilde_ora') on my UTF8 setup gives these two strings

5365266E74696C64653B6F7261
                  xx
5365266E74696C64655F6F7261

See the difference?

If the _ characters are definitely in your data, you have some cyber-speklunking to do. Do you have access to the person who set this up, so you can ask about it? If so, do. It will save you some reverse-engineering time.

If you have to fix this without help, you can try using php like this

 $my_data = str_replace('_',';', $my_data);

That should get the entitized characters to be formatted correctly. But, it will also change standalone _ characters to ;. To fix this right, you'll need a list of all the entitized characters in your data, and you'll need to change them individually.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download