musicsquad musicsquad - 4 months ago 11
Java Question

converting Japanese characters to hex not working

My code is very simple (using commons-codec-1.10.jar)

System.out.println(Hex.encodeHex("三菱グループ".getBytes(StandardCharsets.UTF_8), true));


it yields e4b889e88fb1e382b0e383abe383bce38397 in my PC, but in accoridng to http://codebeautify.org/string-hex-converter, it should be 4e0983f130b030eb30fc30d7. Am I missing anything?

Thanks

Answer

Hex.encodeHex is working fine, but the results are the UTF-8 encoding, whereas codebeautify.org appears to be using UTF-16.

Let's take 三 to start with. That's U+4E09. In UTF-16 that's encoded as 4E 09, which matches the start of your codebeautify output. In UTF-8 it's encoded as E4 B8 89, which matches your Java output.

If you want UTF-16, just use StandardCharsets.UTF_16BE instead of StandardCharsets.UTF_8. (But only do it if you really want UTF-16. UTF-8 is a better encoding to use in most cases, IMO.)

Comments