vineetrok vineetrok - 2 months ago 57
Python Question

"Adding" new fonts to Tesseract eng.traineddata

As far as I know, Tesseract 3.x comes with 6 English (correct me if I'm wrong) fonts. I need to train Tesseract for more 5 types of fonts. I need only capital letters and digits (no special characters or symbols).

I followed various processes for example:
Adding New Fonts to Tesseract 3 OCR Engine

and also used tools to automate the process like
Serak Tesseract Trainer for Tesseract 3.02

For generating box files I used QT Box Editor

After using above tools I get

eng.traineddata
file. All tutorials tell me to add this
eng.traineddata
file to the
Tesseract-OCR\tessdata
folder, but doing so, it will replace the original
eng.traineddata
file. After doing this will I lose the default fonts that come with Tesseract 3.x ?

How can I Add new fonts? Its still not clear to me. I hope someone can help me here. Thanks.

Answer

Should use a different name, e.g., eng1.traineddata. That way you can use the new data with the original one by specifying the language option -l eng+eng1.

Comments