如何使用 Python OpenCV 从 OCR 图像中去除噪声伪影?

How to remove noise artifacts from an image for OCR with Python OpenCV?

我有包含数字的图像子集。每个子集都由 Tesseract 读取以进行 OCR。不幸的是,对于某些图像,原始图像的裁剪不是最佳的。

因此图像顶部和底部的一些 artifacts/remains 阻碍了 Tesseract 识别图像上的字符。然后我想摆脱这些工件并得到类似的结果:

首先我考虑了一个简单的方法:我将第一行像素设置为参考:如果沿着 x-axis 发现伪影(即,如果图像是二值化的,则为白色像素),我删除它沿着 y-axis 直到下一个黑色像素。这种方法的代码如下:

import cv2
inp = cv2.imread("testing_file.tif")
inp = cv2.cvtColor(inp, cv2.COLOR_BGR2GRAY)
_,inp = cv2.threshold(inp, 150, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

ax = inp.shape[1]
ay = inp.shape[0]

out = inp.copy()
for i in range(ax):
    j = 0
    while j in range(ay):
        if out[j,i] == 255:
            out[j,i] = 0
        else:
            break
        j+=1

out = cv2.bitwise_not(out)    
cv2.imwrite('output.png',out)

但是结果一点都不好:

然后我偶然发现了 scipy (here) but found out it was too much time consuming and still not efficient. A similar question was asked on SO but didn't help so much. Maybe a k-nearest neighbor approach could be considered? I also found out that methods that consist in merging neighbors pixels under some criteria were called growing methods, among which the single linkage is the most common (here) 中的 flood_fill 函数。

你会推荐什么来去除上下工件?

这是一个简单的方法:

  • 将图像转换为灰度
  • 大津获取二值图像的阈值
  • 创建特殊的水平内核并扩张
  • 检测水平线,对最大轮廓进行排序,然后绘制到蒙版上
  • 按位与

转成灰度后,我们Otsu的阈值得到二值图像

# Read in image, convert to grayscale, and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

接下来我们创建一个长的水平内核并膨胀以将数字连接在一起

# Create special horizontal kernel and dilate 
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (70,1))
dilate = cv2.dilate(thresh, horizontal_kernel, iterations=1)

从这里我们检测水平线并对最大轮廓进行排序。这个想法是最大的轮廓将是数字的中间部分,其中数字都是 "complete"。任何较小的轮廓都将是部分或截断的数字,因此我们在这里将它们过滤掉。我们将这个最大的轮廓绘制到蒙版上

# Detect horizontal lines, sort for largest contour, and draw on mask
mask = np.zeros(image.shape, dtype=np.uint8)
detected_lines = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255), -1)
    break

现在我们有了所需数字的轮廓,我们只需按位并使用我们的原始图像并将背景着色为白色即可得到我们的结果

# Bitwise-and to get result and color background white
mask = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_and(image,image,mask=mask)
result[mask==0] = (255,255,255)

完整性的完整代码

import cv2
import numpy as np

# Read in image, convert to grayscale, and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Create special horizontal kernel and dilate 
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (70,1))
dilate = cv2.dilate(thresh, horizontal_kernel, iterations=1)

# Detect horizontal lines, sort for largest contour, and draw on mask
mask = np.zeros(image.shape, dtype=np.uint8)
detected_lines = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255), -1)
    break

# Bitwise-and to get result and color background white
mask = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_and(image,image,mask=mask)
result[mask==0] = (255,255,255)

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('result', result)
cv2.waitKey()