NigelJL NigelJL - 2 months ago 33
Python Question

Google Cloud Vision - Numbers and Numerals OCR

I've been trying to implement an OCR program with Python that reads numbers with a specific format, XXX-XXX. I used Google's Cloud Vision API Text Recognition, but the results were unreliable. Out of 30 high-contrast 1280 x 1024 bmp images, only a handful resulted in the correct output, or at least included the correct output in the results. The program tends to omit some numbers, output in non-English languages or sneak in a few special characters.

The goal is to at least output the correct numbers consecutively, doesn't matter if the results are sprinkled with other junk. Is there a way to help the program recognize numbers better, for example limit the results to a specific format, or to numbers only?

Answer

At this moment it is not possible to add constraints or to give a specific expected number format to Vision API requests, as mentioned here (by the Project Manager of Cloud Vision API).

You can also check all the possible request parameters (in the API reference), none indicating anything to specify number format. Currently only options to:

  • latLongRect: specify location of the image
  • languageHints: indicating the expected language for text_detection (list of supported languages here)

I assume you already checked out the multiple responses (with different included image regions) to see if you could reconstruct the text using the location of different digits?

Note that the Vision API and text_detection is not optimized for your data specifically, if you would have a lot of annotated data, it is also an option to actually build your own model using Tensorflow. This blogpost explains a system setup to detect number plates (with a specific number format). All the code is available on Github and the problem seems very related to yours.