Roy Roy - 3 months ago 31
PHP Question

Codeigniter and charsets

I'm using Codeigniter not for so long but I've some charset problems.. I'm asking around at the CI Forum, but I want to go further, still no global solution: http://codeigniter.com/forums/viewthread/204409/

The problem was a database error 1064. I've got a solution, use iconv! Works fine, but I think it's not necessary. I'm searching a lot on the internet for charset's etc but I'm using CI now, how about charsets and CI...

So I've a lot of question about it, I hope someone can make it clear for me:

What’s the best way to set the charset global? And what to set?


  • In the head

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

  • In config/config.php

    $config['charset'] = 'UTF-8';

  • In config/database.php

    $db['default']['char_set'] = 'utf8';


    $db['default']['dbcollat'] = 'utf8_general_ci';

  • In .htaccess, my rewrite rules and

    php_value magic_quotes_gpc Off


    AddDefaultCharset UTF-8

  • Also need send a header? Where to place? Something like?

    header('Content-Type: text/html; charset=UTF-8');

  • In my editor (Notepad++) save files as UTF-8? Or UTF-8 (without BOM)? Or is ANSI good (this is what I’m using now)?

  • Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?

  • How about reading RSS feeds, how to handle multiple charsets? Where I’m working on I’ve two feeds, one with UTF-8 encoding and the other with ISO-8859-1. This will be stored in the database and will be compared sometimes to see if there are new items. It fails on special chars.



I'm working with:
- CI 2.0.3
- PHP 5.2.17
- MySQL 5.1.58

More information added:

Model:

function update_favorite($data)
{
$this->db->where('id', $data['id']);
$this->db->where('user_id', $data['user_id']);
$this->db->update('favorites', $data);
return;
}


Controller:

$this->favorites_model->update_favorite(array(
'id' => $id,
'rss_last' => $rss_last,
'user_id' => $this->session->userdata('user_id')
));


When $rss_last is a “normal” value like: “test” (without quotes) it works fine.
When it’s a value with more length like (in Dutch): F-Secure vindt malware met certificaat van Maleisische overheid

I get this error:


Error Number: 1064

You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ‘vindt malware met certificaat van Maleisische overheid,
user_id
= ‘1’ WHERE `i’ at line 1

UPDATE
favorites
SET
id
= ‘15’,
rss_last
= F-Secure vindt
malware met certificaat van Maleisische overheid,
user_id
= ‘1’
WHERE
id
= ‘15’ AND
user_id
= ‘1’

Filename:
/home/.../domains/....nl/public_html/new/models/favorites_model.php

Line Number: 35


Someone at the CI forum told me to use this:

'rss_last' => iconv("UTF-8", "UTF-8//TRANSLIT", $rss_last)


This works fine, but I think this is not necessary..

The value $rss_last came out a RSS feed, as told before, sometimes a UTF-8 and other times a ISO-8859-1 encoding:

$rss = file_get_contents('http://www.website.com/rss.xml');
$feed = new SimpleXmlElement($rss);
$rss_last = $feed->channel->item[0]->title;


It looks like this last part is the problem, when $rss_last is set to the value it works fine:

$rss_last = 'F-Secure vindt malware met certificaat van Maleisische overheid';


When the value came out the RSS it give problems...

Some more questions..

Just found this: PHP: Detect encoding and make everything UTF-8

Best solution? But.. is iconv not more simple, do something like this:

$encoding = some_function_to_get_encoding_from_feed($feed);
$rss_last = iconv($encoding, "UTF-8//TRANSLIT", $feed->channel->item[0]->title);


But what to use for "some_function_to_get_encoding_from_feed"? mb_detect_encoding?

And mb_convert_encoding vs iconv?

Answer

1) There is no global solution.

2)

AddDefaultCharset UTF-8

It's needed for Apache response to client with right encoding. Make it.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

not necessarily, but recommended by W3C.

$config['charset'] = 'UTF-8';

it's desirable

$db['default']['char_set'] = 'utf8';
$db['default']['dbcollat'] = 'utf8_general_ci';

Encoding for CI connection to database. If encoding of your database is UTF-8 - make it mandatory.

header('Content-Type: text/html; charset=UTF-8');

Do not do this unless necessary. Charset already indicated in HTML code and .htaccess.

Use utf8_unicode_ci or utf8_general_ci for the MySQL database? And why?

For their own language (Russian), I use utf8_general_ci.

In my editor (Notepad++) save files as UTF-8?

Absolutely! All code that Apache will give as UTF8 should be in UTF8.

How about reading RSS feeds, how to handle multiple charsets?

If you have each RSS in each table - you can specify charset for each table and set right encoding with each sql query. Yes, cyrillic symbols, for example, will fails on non-UTF8.

Comments