执行七段显示图像的 OCR

Question

我正在对电能表显示执行 OCR：example 1 example 2 example 3

我尝试将 tesseract-ocr 与 letsgodigital 训练数据一起使用。但是性能很差

我对这个话题还很陌生，这就是我所做的：

import numpy as np
import cv2
import imutils
from skimage import exposure
from pytesseract import image_to_string
import PIL


def process_image(orig_image_arr):

  gry_disp_arr = cv2.cvtColor(orig_image_arr, cv2.COLOR_BGR2GRAY)
  gry_disp_arr = exposure.rescale_intensity(gry_disp_arr, out_range= (0,255))

  #thresholding
  ret, thresh = cv2.threshold(gry_disp_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
  
  return thresh

def ocr_image(orig_image_arr):
  otsu_thresh_image = process_image(orig_image_arr)
  cv2_imshow(otsu_thresh_image)
  return image_to_string(otsu_thresh_image, lang="letsgodigital", config="--psm 8 -c tessedit_char_whitelist=.0123456789")

img1 = cv2.imread('test2.jpg')
cnv = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
text = ocr_image(cnv)

示例图像的结果非常差。我有几个问题：
如何识别显示屏的四个角？（边缘检测好像不太好用）
我可以做任何进一步的预处理来提高性能吗？

感谢您的帮助。

Answer 1

注意您的功率计如何使用蓝色或绿色 LED 来点亮显示屏；我建议您使用这种彩色显示器来发挥您的优势。我要做的是 select 只有一个基于 LED 颜色的 RGB 通道。然后我可以根据一些算法或假设对它进行阈值处理。之后，您可以进行裁剪/调整大小/转换/OCR等下游步骤

例如，使用您的 example image 1, look at its histogram here. 请注意在 150 标记的右侧有一个绿色的小峰。

我利用这一点，将低于 150 的任何值设置为零。我的假设是绿色峰值是图像中明亮的绿色 LED。

img = cv2.imread('example_1.jpg', 1)

# Get only green channel
img_g = img[:,:,1]
# Set threshold for green value, anything less than 150 becomes zero
img_g[img_g < 150] = 0

This is what I get. 现在下游 OCR 应该更容易了。

# You should also set anything >= 150 to max value as well, but I didn't in this example
img_g[img_g >= 150] = 255

上面的步骤应该代替这一步

_ret, thresh = cv2.threshold(img_g, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

Here's the output of this step.

执行七段显示图像的 OCR

Performing OCR of Seven Segment Display images

python

ocr

opencv

tesseract

image-processing