We've got a situation where several months of data that was collected via ODK Aggregate into a MySQL database is unreadable.
Data is Georgian characters, but was sent to a database with a latin1 character set/collation.
Data managers didn't catch this issue until a few days ago, and I was never made aware that they were using surveys with these characters... so now the questions obviously are
1) can we recover existing data? and
2) how to make sure future data is readable?
I can do a
SELECT HEX(column) FROM table
and get the hex output, but here's what that looks like:
As you can see the last line looks right, but the others are not. When I created a test table with latin1 and tried to insert Georgian characters, I got
Warning: #1366 Incorrect string value: '\xE1\x83\x93\xE1\x83\x93...' for column 'georgian_text' at row 1
I don't see anything in Tomcat logs, but I'm assuming that Aggregate would have gotten the same error each time a record was submitted.
My question is: can the hex in the first rows be converted to anything useful?
I was unable to recover what was lost, but the answer for the sake of documentation is to always have the following in your /etc/my.cnf from the start so these issues don't happen in the first place.
character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci init_connect='SET NAMES utf8mb4'