musicsquad musicsquad - 1 year ago 69
Java Question

converting Japanese characters to hex not working

My code is very simple (using commons-codec-1.10.jar)

System.out.println(Hex.encodeHex("三菱グループ".getBytes(StandardCharsets.UTF_8), true));

it yields e4b889e88fb1e382b0e383abe383bce38397 in my PC, but in accoridng to, it should be 4e0983f130b030eb30fc30d7. Am I missing anything?


Answer Source

Hex.encodeHex is working fine, but the results are the UTF-8 encoding, whereas appears to be using UTF-16.

Let's take 三 to start with. That's U+4E09. In UTF-16 that's encoded as 4E 09, which matches the start of your codebeautify output. In UTF-8 it's encoded as E4 B8 89, which matches your Java output.

If you want UTF-16, just use StandardCharsets.UTF_16BE instead of StandardCharsets.UTF_8. (But only do it if you really want UTF-16. UTF-8 is a better encoding to use in most cases, IMO.)