gknauth gknauth - 3 months ago 20
Java Question

Why does Log4J2 output differ on two systems when I am writing the same UTF-8?

I'm writing Unicode characters to a Log4J2 log. On one machine (Windows 8) I see this in the log:

2016-08-30 16:44:00.958|English: The quick brown fox jumped over the lazy dog.
2016-08-30 16:44:00.960|German: Falsches Üben von Xylophonmusik quält jeden größeren Zwerg.
2016-08-30 16:44:00.960|Russian 1: В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!
2016-08-30 16:44:00.960|Russian 2: Съешь же ещё этих мягких французских булок да выпей чаю.
2016-08-30 16:44:00.960|Chinese: 中国智造,慧及全球
2016-08-30 16:44:00.960|Japanese: いろはにほへと ちりぬるを わかよたれそ つねならむ うゐのおくやま けふこえて あさきゆめみし ゑひもせす
2016-08-30 16:44:00.960|Korean: 다람쥐 헌 쳇바퀴에 타고파


On another machine (Windows Server 2012R2) I see this:

2016-08-30 16:50:41.676|English: The quick brown fox jumped over the lazy dog.
2016-08-30 16:50:41.676|German: Falsches Üben von Xylophonmusik quält jeden größeren Zwerg.
2016-08-30 16:50:41.676|Russian 1: ? ????? ??? ??? ?? ??????? ??, ?? ????????? ?????????!
2016-08-30 16:50:41.676|Russian 2: ????? ?? ??? ???? ?????? ??????????? ????? ?? ????? ???.
2016-08-30 16:50:41.676|Chinese: ?????????
2016-08-30 16:50:41.676|Japanese: ??????? ????? ?????? ????? ??????? ????? ??????? ?????
2016-08-30 16:50:41.676|Korean: ??? ? ???? ???


If Log4J2 writes UTF-8 by default, why does the log file on the 2nd system contain only question marks? That is, the second system may (and probably is) missing fonts, but the log file itself on the 2nd system contains actual question marks when, using a hexdump tool, I would expect to see at least the binary for the UTF-8 characters in the file. Put another way, I can understand why an unknown character might render incorrectly, I just don't understand why the correct Unicode was not written to the file, if the process doing the writing is the JVM, which uses Unicode for characters.

xav xav
Answer

Did you try to enforce the UTF-8 charset for your Log4j Layout, inside your Log4j configuration file? For example, using PatternLayout:

<Configuration ...>
    ...
    <PatternLayout pattern="..." charset="UTF-8"/>
    ...
</Configuration>

See https://logging.apache.org/log4j/2.x/manual/layouts.html for more information on Log4j encoding issues.

Comments