Maddin Maddin - 5 months ago 50
Android Question

Split words with Tesseract tess-two on Android

I try to use tesseract tess-two to read question and answears from images in android. At the moment I get a String with every word on the image.
My problem is that I can't split the answears
Is it possible to split the answear with TessBaseAPI? A solution in java/android would be also fine ;)

public String detectText(Bitmap bitmap) {
Log.d(TAG, "Initialization of TessBaseApi");
TessDataManager.initTessTrainedData(context);
TessBaseAPI tessBaseAPI = new TessBaseAPI();
String path = TessDataManager.getTesseractFolder();
Log.d(TAG, "Tess folder: " + path);
tessBaseAPI.setDebug(true);
tessBaseAPI.init(path, "eng");
tessBaseAPI.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
"abcdefghijklnmopqrstuvwxyzäüößÄÖÜ!?@#$%^&*+=-;()/");
tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);

Log.d(TAG, "Ended initialization of TessEngine");
Log.d(TAG, "Running inspection on bitmap");
tessBaseAPI.setImage(bitmap);

String inspection = tessBaseAPI.getUTF8Text();
Log.d(TAG, "Got data: " + inspection);
tessBaseAPI.end();
System.gc();
return inspection;
}


Here is an example how the image look like

Answer

This is the way how it works:

tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SPARSE_TEXT);
Comments