I started off writing a simple script to read data from an image. Here is my Ruby code that uses RTesseract to read it:
RTesseract.configure do |config|
config.processor = "mini_magick"
image = RTesseract.new("myImage.jpg")
I tested your script on my Linux Mint 17 machine, with tesseract 3.03 , Ruby 2.1.5 and MiniMagick 4.5.1
It also returns
If you're sure that digits are encoded, you could try :
image = RTesseract.new("myImage.jpg", options: :digits)
Launching tesseract without parameter gives you a list of possible options. "pagesegmode 7" looks interesting :
7 = Treat the image as a single text line.
image = RTesseract.new("myImage.jpg", options: :digits, psm: 7)
13223 4 3 21 8.
With your second image, it returns
3 21 8.
I think the biggest problem now is that the JPG artifacts are pretty strong and the contrast is relatively low between digits and background. A PNG image would probably yield better results.
With gimp, I resized the image to 200px height, cropped close to the digits to remove some artifacts, used Colors/Threshold at 150, inverted the image and saved as png :
Rtesseract returns :
1320 4 3 0 8
With Image Magick, this command achieved the same result :
convert myImage.jpg -geometry x200 -threshold 13% -negate myImage.png