Shashank Agarwal Shashank Agarwal - 2 months ago 14
MySQL Question

Parse from XML, insert to mysql; characters give java.sql.SQLException: Incorrect string value

I am parsing a bunch of XML files and inserting the value obtained from them into a MySQL database. The character set of the mysql tables is set to utf8. I'm connecting to the database using the following connection url -


Most of the string values with unicode characters are entered fine (like Greek letters etc.), except for some that have a math symbol. An example in particular - when I try to insert a string with mathematical script capital g (img at (


MySQL up to version 5.1 seems to only support unicode characters in the basic multilingual plane, which when encoded as utf-8 take no more than 3 bytes. From the manual on unicode support in version 5.1:

MySQL 5.1 supports two character sets for storing Unicode data:

  • ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character
  • utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character

In version 5.5 some new character sets where added:


  • utf8mb4, a UTF-8 encoding of the Unicode character set using one to four bytes per character

ucs2 and utf8 support BMP characters. utf8mb4, utf16, and utf32 support BMP and supplementary characters.

So if you are on mysql 5.1 you would first have to upgrade. In later versions you have to change the charset to utf8mb4 to work with these supplementary characters.

It seems the jdbc connector also requires some further configuration (From Connector/J Notes and Tips):

To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set.