Sunday, May 28, 2006

Supporting CJK characters in MySQL

Recently I had to write some CJK/unicode characters from Java into MySQL. I use MySQL 5.1 and the connector mysql-connector-java-3.1.10-bin.jar .

The connector appears to be able to read CJK/unicode characters in text retrieved from queries correctly, if they were entered via MySQL query browser into the database properly as unicode. But writing CJK/unicode to the database (insert, update) using the connector using either a normal SQL statement or setString() with a PreparedStatement appeared to translate the CJK characters to a single ? (code 3F).

It appeared that I had to write the text data as bytes instead to get them written properly as unicode. I chose the encoding UTF-8. Then I changed the column specific encoding and charset for the text field to hold the CJK text as "utf8_ci" and "utf8_charset". Then to write the text in an insert operation using a PreparedStatement:

here we assume we already have the database Connection conn established, and we are trying to insert text from String cjkword:


PreparedStatement stmt = conn.prepareStatement("INSERT INTO cjktable (cjktext) VALUES ( ? )");
try {
stmt.setBytes(1, cjkword.getBytes("utf8"));
} catch(UnsupportedEncodingException e) {
stmt.setBytes(1, cjkword.getBytes());
}


the try block is to catch the exception (java.io.UnsupportedEncodingException), which occurs if you have specified an unsupported encoding. UTF8 should be supported (by Sun JDK anyway) so it should not occur.

--Edited per comments

Tuesday, May 09, 2006

Morning!

And after three months of non-blogging :), here is a view of my window at 5.20am in the morning.