Freewind Freewind - 6 months ago 83
Java Question

Unicode chars are converted to broken symbols when I use wkhtmltopdf

I have HTML that contains some Unicode characters, and saved in "UTF-8" to disk. I can use less to display it, all characters displayed well:

<h1>什么是Action?</h1>
<p>Play程序接收到的大部分请求,都是由<code>Action</code>来处理的。


But when I use "wkhtmltopdf" to convert it to PDF, it shows broken characters:

broken unicode

My command is:

wkhtmltopdf --encoding utf-8 book.html book.pdf


How to fix this?

Answer

Finally I found the reason: I don't have unicode fonts in my ubuntu server.

I upload some truetype fonts from my local ubuntu to the server, everything works fine.

freewind@freewind:/usr/share/fonts$ cd truetype/
freewind@freewind:/usr/share/fonts/truetype$ ls
arphic             ttf-dejavu               ttf-lao
freefont           ttf-devanagari-fonts     ttf-liberation
kochi              ttf-gujarati-fonts       ttf-malayalam-fonts
msttcorefonts      ttf-indic-fonts-core     ttf-oriya-fonts
openoffice         ttf-japanese-gothic.ttf  ttf-punjabi-fonts
sazanami           ttf-japanese-mincho.ttf  ttf-tamil-fonts
takao              ttf-kacst-one            ttf-telugu-fonts
thai               ttf-kannada-fonts        unfonts
ttf-bengali-fonts  ttf-khmeros-core         wqy

I simply upload them all, it fix this problem, although I don't know which font is the key.