I'm having a rather strange problem:
I am working with a database (which I did not design), this database is multi-lingual, that is, there are titles in English, Spanish, Russian, Vietnamese, etc.
From what I have seen titles with characters of type "ñ", "á", "é", "ë", have been stored in the database in this way: "ñ_", so I know in html for show these characters how to write them is "
Title in database: Señ_ora // Señora
Title obtained by PHP: Señ_ora // Señora
characters of type "ñ", "á", "é", "ë", have been stored in the database in this way: "ñ_"
This is bizarre.
First of all, make sure your database actually contains these
_ characters, and make sure you're not seeing some sort of substitution character being rendered. Whatever program you're using to show the data might have some character set option set incorrectly.
You might say
SELECT field, HEX(field) FROM table WHERE field LIKE '%' ORDER BY CHAR_LENGTH(field) LIMIT 10 to find a few relatively short examples. Then pore over the hex output looking for
3B (hex for
5F (hex for
SELECT HEX('Señora'), HEX('Señ_ora') on my UTF8 setup gives these two strings
5365266E74696C64653B6F7261 xx 5365266E74696C64655F6F7261
See the difference?
_ characters are definitely in your data, you have some cyber-speklunking to do. Do you have access to the person who set this up, so you can ask about it? If so, do. It will save you some reverse-engineering time.
If you have to fix this without help, you can try using php like this
$my_data = str_replace('_',';', $my_data);
That should get the entitized characters to be formatted correctly. But, it will also change standalone
_ characters to
;. To fix this right, you'll need a list of all the entitized characters in your data, and you'll need to change them individually.