tesseract 没有得到小标签

Question

我已经在我的 linux 环境中安装了 tesseract。

当我执行类似

的操作时，它会起作用

# tesseract myPic.jpg /output

但是我的照片有一些小标签，tesseract 没有看到它们。

是否有一个选项可以用来设置音调或类似的东西？

文本标签示例：

对于这张图片，tesseract 无法识别任何值...

但是有了这张照片：

我有以下输出：

J8

J7A-J7B P7 \

2
40 50 0 180 190

200

P1 P2 7

110 110
\ l

例如，在这种情况下，tesseract 看不到 90（左上角）...

我认为这只是一个定义或类似想法的选项，不是吗？

感谢

Answer 1

为了从 Tesseract（以及任何 OCR 引擎）获得准确的结果，您需要遵循一些准则，如我对此 post 的回答所示： Junk results when using Tesseract OCR and tess-two

要点如下：

Use a high resolution image (if needed) 300 DPI is minimum

Make sure there is no shadows or bends in the image

If there is any skew, you will need to fix the image in code prior to ocr

Use a dictionary to help get good results

Adjust the text size (12 pt font is ideal)

Binarize the image and use image processing algorithms to remove noise

还建议花一些时间训练 OCR 引擎以获得更好的结果，如下所示 link：Training Tesseract

我拍摄了您分享的 2 张图片，并使用 LEADTOOLS SDK（免责声明：我是这家公司的员工）运行对它们进行了一些图像处理，并且能够获得比你得到的是经过处理的图像，但由于原始图像不是最好的——它仍然不是 100%。这是我用来尝试修复图像的代码：

//initialize the codecs class
using (RasterCodecs codecs = new RasterCodecs())
{
   //load the file
   using (RasterImage img = codecs.Load(filename))
   {
      //Run the image processing sequence starting by resizing the image
      double newWidth = (img.Width / (double)img.XResolution) * 300;
      double newHeight = (img.Height / (double)img.YResolution) * 300;
      SizeCommand sizeCommand = new SizeCommand((int)newWidth, (int)newHeight, RasterSizeFlags.Resample);
      sizeCommand.Run(img);

      //binarize the image
      AutoBinarizeCommand autoBinarize = new AutoBinarizeCommand();
      autoBinarize.Run(img);

      //change it to 1BPP
      ColorResolutionCommand colorResolution = new ColorResolutionCommand();
      colorResolution.BitsPerPixel = 1;
      colorResolution.Run(img);

      //save the image as PNG
      codecs.Save(img, outputFile, RasterImageFormat.Png, 0);
   }
}

以下是此过程的输出图像：

tesseract 没有得到小标签

tesseract didn't get the little labels

ocr

tesseract