如何检测图像中记录的边界？

Question

我有大量高分辨率（2500 x 3500 像素）的 JPEG 图片，大致是这个形状：

每个数字指定一个单独的记录，我的目标是将它们转换为文本。

我知道各种 OCR 解决方案，例如 OpenCV 或 Tesseract，但我的问题是检测每条记录的边界（以便稍后将每条记录提供给 OCR）。我怎样才能实现这样的目标：

Answer 1

由于每条记录都以蓝色数字开头，您可以使用 HSV 颜色 space 来屏蔽这些文本的蓝色阈值。在该掩码上，使用形态学闭合，从这些蓝色文本中获取“框”。从修改后的蒙版中，找到轮廓，并确定它们的上 y 坐标。通过从一个 y 坐标切片到下一个（+/- 几个像素）并使用整个图像宽度，从原始图像中提取单个记录。

这里有一些代码：

import cv2
import numpy as np

# Read image
img = cv2.imread('CfOBO.png')

# Thresholding blue-ish colors using HSV color space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
blue_lower = (90, 128, 64)
blue_upper = (135, 255, 192)
blue_mask = cv2.inRange(hsv, blue_lower, blue_upper)

# Morphological closing
blue_mask = cv2.morphologyEx(blue_mask, cv2.MORPH_CLOSE, np.ones((11, 11)))

# Find contours w.r.t. the OpenCV version
cnts = cv2.findContours(blue_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

# Get y coordinate from bounding rectangle for each contour
y = sorted([cv2.boundingRect(cnt)[1] for cnt in cnts])

# Manually add end of last record
y.append(img.shape[0])

# Extract records
records = [img[y[i]-5:y[i+1]-5, ...] for i in range(len(cnts))]

# Show records
for record in records:
    cv2.imshow('Record', record)
    cv2.waitKey(0)
cv2.destroyAllWindows()

还有很大的优化空间，例如如果最后一条记录后面有一些大的白色 space。我刚刚为最后一条记录的下端添加了图像底部。但是，一般的工作流程应该做所期望的。（我遗漏了以下 pytesseract 内容。）

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.16299-SP0
Python:        3.9.1
NumPy:         1.20.1
OpenCV:        4.5.1
----------------------------------------

如何检测图像中记录的边界？

How to detect the boundaries of records in an image?

opencv

tesseract

image-processing

python-3.x