Ran Deloun Ran Deloun - 9 months ago 41
Java Question

Write arabic caracters with PDFBOX

  1. Update 1

I'm trying to write some Arabic characters in a pdf document using pdfbox. As a result I get some strange characters. You can find below the code snippet I used for my test. Notice that the same code was used to print Latin characters without any problem.

public static void main(String[] args) throws Exception {

PDDocument document = new PDDocument();

PDPage page = new PDPage(PDPage.PAGE_SIZE_A4);

PDPageContentStream stream = new PDPageContentStream(document, page,true, true);

//Use of a unicode font
PDFont font = PDTrueTypeFont.loadTTF(document,"C:/arialuni.ttf");

font.setFontEncoding(new WinAnsiEncoding());

stream.setFont(font, 12);

stream.moveTextPositionByAmount(40, 600);

stream.drawString("سي ججس ححسيب حسججسيبنم حح ");


Thanks for your help. I tried a Unicode font downloaded from Microsoft website ,but I still have the same result.

  1. Update 2

By using the method 'drawUnicodeString' and the mehod 'loadTTF' I got form the PDFBOX-922
I was able to write arabic charactersm but they are disconnected and ordered from left-to-right. Here are the two methods 'drawUnicodeString' and 'loadTTF'

public void drawUnicodeString(String text) throws IOException {
COSString string = new COSString();
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
string.append(c >> 8);
string.append(c & 0xff);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();

public static PDType0Font loadTTF(PDDocument doc, InputStream is)
throws IOException {
/* Load the font which we will convert to Type0 font. */
PDTrueTypeFont pdTtf = PDTrueTypeFont.loadTTF(doc, is);

TrueTypeFont ttf = pdTtf.getTTFFont();
CMAPEncodingEntry unicodeMap = null;
for (CMAPEncodingEntry candidate : ttf.getCMAP().getCmaps()) {
if (candidate.getPlatformId() == CMAPTable.PLATFORM_WINDOWS
&& candidate.getPlatformEncodingId() == CMAPTable.ENCODING_UNICODE) {
unicodeMap = candidate;
if (unicodeMap == null) {
throw new RuntimeException(
"To use as CIDFont, the TTF must have a Windows platform Unicode encoding");
float scaling = 1000f / ttf.getHeader().getUnitsPerEm();

MyPDCIDFontType2Font pdCidFont2 = new MyPDCIDFontType2Font();
pdCidFont2.setFontDescriptor((PDFontDescriptorDictionary) pdTtf
/* Fixme -- should determine the minimum and maximum charcode in the map */
int[] cid2gid = new int[65536];
List<Float> widths = new ArrayList<Float>();
int[] widthValues = ttf.getHorizontalMetrics().getAdvanceWidth();
for (int i = 0; i < cid2gid.length; i++) {
int glyph = unicodeMap.getGlyphId(i);
cid2gid[i] = glyph;
widths.add((float) i);
widths.add((float) i);
widths.add(widthValues[glyph] * scaling);

/* Now construct the type0 font that we actually return */
myType0Font pdFont0 = new myType0Font();
pdFont0.setDescendantFonts(new COSObject(pdCidFont2.getCOSObject()));


// pdfont0.setToUnicode(COSName.IDENTITY_H); XXX how to express identity
// mapping as ToUnicode program? */
return pdFont0;

and here are the characters printed :

disconnected arabic letters

I don't know why these characters are disconnected

Answer Source

Arabic can be written by applying both PDFBOX-922 and PDFBOX-1287 .(the diff files are attached to in issues description) I hope that the patches will be applied in the version 2.0.