使用 cv2 / pytesseract 进行数字识别的局部对比度增强

Question

我想使用 pytesseract 从图像中读取数字。图片如下所示：

数字是点缀的，为了能够使用 pytesseract，我需要白色背景上的黑色连接数字。为此，我考虑使用 erode 和 dilate 作为预处理技术。如您所见，这些图像很相似，但在某些方面却大不相同。例如，第一个图像中的点比背景暗，而第二个图像中的点更白。这意味着，在第一个图像中，我可以使用侵蚀来获得黑色连接线，在第二个图像中，我可以使用扩张来获得白色连接线，然后反转颜色。这导致以下结果：

使用适当的阈值，可以使用 pytesseract 轻松读取第一张图像。第二张图片，不管是谁，都比较棘手。问题是，例如“4”的部分比这三个周围的背景更暗。所以一个简单的阈值是行不通的。我需要局部阈值或局部对比度增强之类的东西。这里有人有想法吗？

编辑：

OTSU、平均阈值和高斯阈值导致以下结果：

Answer 1

你的图像分辨率很低，但你可以尝试一种叫做增益分割的方法。这个想法是您尝试构建背景模型，然后通过该模型对每个输入像素进行加权。在大部分图像期间，输出增益应该相对恒定。

执行增益划分后，您可以尝试通过应用区域过滤器和形态学来改善图像。我只试了你的第一张图片，因为它是“最不糟糕”的。

这些是获得增益分割图像的步骤：

应用软 中值模糊 滤镜去除高频噪声。
通过局部最大值获取背景模型。应用非常强大的 close 操作，具有大 structuring element（我使用的是大小为 15 的矩形内核）。
通过在每个局部最大像素之间划分 255 来执行 增益调整 。用每个输入图像像素加权这个值。
你应该得到一张漂亮的图像，其中背景照明几乎 标准化，threshold 此图像以获得字符的二进制掩码。

现在，您可以通过以下附加步骤提高图像质量：

Threshold 来自 Otsu，但添加了 一点偏差 。（不幸的是，这是一个手动步骤，具体取决于输入）。
应用区域过滤器过滤掉较小的噪音斑点。

让我们看看代码：

import numpy as np
import cv2

# image path
path = "C:/opencvImages/"
fileName = "iA904.png"

# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)

# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)

# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8") 

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

这就是增益除法给你带来的：

请注意，光线更加平衡。现在，让我们应用一点对比度增强：

# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))

你明白了，它在前景和背景之间创造了更多的对比度：

现在，让我们尝试对这张图像设置阈值以获得一个漂亮的二进制掩码。正如我所建议的，尝试 Otsu 的阈值化，但对结果添加（或减去）一点偏差。如前所述，此步骤取决于您输入的质量：

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)

你最终得到这个二进制掩码：

反转它并过滤掉小斑点。我将 area 阈值设置为 10 像素：

# Invert image:
binaryImage = 255 - binaryImage

# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)

# Set the minimum pixels for the area filter:
minArea = 10

# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]

# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")

这是最终的二进制掩码：

如果您打算将此图像发送到 OCR，您可能需要先应用一些 形态学。也许 closing 尝试连接构成字符的点。还要确保使用接近的字体来训练您的 OCR 分类器，以接近您实际尝试识别的字体。这是 3 rectangular closing 操作 3 次迭代后的（倒置）掩码：

编辑：

要得到最后一张图片，对过滤后的输出进行如下处理：

# Set kernel (structuring element) size:
kernelSize = 3

# Set operation iterations:
opIterations = 3

# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)

# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage

使用 cv2 / pytesseract 进行数字识别的局部对比度增强

Local Contrast Enhancement for Digit Recognition with cv2 / pytesseract

python

ocr

opencv

python-tesseract

cv2