Sunday, May 28, 2006

Supporting CJK characters in MySQL

Recently I had to write some CJK/unicode characters from Java into MySQL. I use MySQL 5.1 and the connector mysql-connector-java-3.1.10-bin.jar .

The connector appears to be able to read CJK/unicode characters in text retrieved from queries correctly, if they were entered via MySQL query browser into the database properly as unicode. But writing CJK/unicode to the database (insert, update) using the connector using either a normal SQL statement or setString() with a PreparedStatement appeared to translate the CJK characters to a single ? (code 3F).

It appeared that I had to write the text data as bytes instead to get them written properly as unicode. I chose the encoding UTF-8. Then I changed the column specific encoding and charset for the text field to hold the CJK text as "utf8_ci" and "utf8_charset". Then to write the text in an insert operation using a PreparedStatement:

here we assume we already have the database Connection conn established, and we are trying to insert text from String cjkword:


PreparedStatement stmt = conn.prepareStatement("INSERT INTO cjktable (cjktext) VALUES ( ? )");
try {
stmt.setBytes(1, cjkword.getBytes("utf8"));
} catch(UnsupportedEncodingException e) {
stmt.setBytes(1, cjkword.getBytes());
}


the try block is to catch the exception (java.io.UnsupportedEncodingException), which occurs if you have specified an unsupported encoding. UTF8 should be supported (by Sun JDK anyway) so it should not occur.

--Edited per comments

2 comments:

gábor said...

some ideas/comments:

1. it would be better to catch UnsupportedEncodingException and not Exception. it's always recommended to catch the most specific exception.

2. in the catch-part perhaps you could log this issue. because as you said, it should never happen. so if it happens, it means some kind of (fatal?) error.

cwho said...

Thanks!

(1) is put in. Its a good practice to catch only exceptions you are looking for, and have a catch-all outside for component exception isolation.

(2) Logging is application specific (depending on what is used), I'll leave it to whoever uses this snippet in their app.