聚类边界框并在其上画线（OpenCV，Python）

Question

使用这段代码，我在下图中的字符周围创建了一些边界框：

import csv
import cv2
from pytesseract import pytesseract as pt

pt.run_tesseract('bb.png', 'output', lang=None, boxes=True, config="hocr")

# To read the coordinates
boxes = []
with open('output.box', 'rt') as f:
    reader = csv.reader(f, delimiter=' ')
    for row in reader:
        if len(row) == 6:
            boxes.append(row)

# Draw the bounding box
img = cv2.imread('bb.png')
h, w, _ = img.shape
for b in boxes:
    img = cv2.rectangle(img, (int(b[1]), h-int(b[2])), (int(b[3]), h-int(b[4])), (0, 255, 0), 2)

cv2.imshow('output', img)
cv2.waitKey(0)

输出

我想要的是：

程序要在bounding box的X轴上画一条垂线（只针对第一个和第三个text-area。中间的一定不要对这个过程感兴趣）

目标是这样的（还有另一种实现方式，请解释）：一旦我有了这两条线（或者，更好的是，一组坐标），使用遮罩来覆盖这两个区域。

可能吗？

源图片：

CSV 请求：打印（盒子）

[['l', '56', '328', '63', '365', '0'], ['i', '69', '328', '76', '365', '0'], ['n', '81', '328', '104', '354', '0'], ['e', '108', '328', '130', '354', '0'], ['1', '147', '328', '161', '362', '0'], ['m', '102', '193', '151', '227', '0'], ['i', '158', '193', '167', '242', '0'], ['d', '173', '192', '204', '242', '0'], ['d', '209', '192', '240', '242', '0'], ['l', '247', '193', '256', '242', '0'], ['e', '262', '192', '292', '227', '0'], ['t', '310', '192', '331', '235', '0'], ['e', '334', '192', '364', '227', '0'], ['x', '367', '193', '398', '227', '0'], ['t', '399', '192', '420', '235', '0'], ['-', '440', '209', '458', '216', '0'], ['n', '481', '193', '511', '227', '0'], ['o', '516', '192', '548', '227', '0'], ['n', '553', '193', '583', '227', '0'], ['t', '602', '192', '623', '235', '0'], ['o', '626', '192', '658', '227', '0'], ['t', '676', '192', '697', '235', '0'], ['o', '700', '192', '732', '227', '0'], ['u', '737', '192', '767', '227', '0'], ['c', '772', '192', '802', '227', '0'], ['h', '806', '193', '836', '242', '0'], ['l', '597', '49', '604', '86', '0'], ['i', '610', '49', '617', '86', '0'], ['n', '622', '49', '645', '75', '0'], ['e', '649', '49', '671', '75', '0'], ['2', '686', '49', '710', '83', '0']]

编辑：

要使用 zindarod 答案，您需要 tesserocr。通过 pip install tesserocr 安装可能会给您带来各种错误。我找到了它的 wheel 版本（在尝试安装和解决错误的几个小时后，请参阅我在答案下方的评论...）：here you can find/download it.

希望这对您有所帮助..

Answer 1

Google 的 tesseract-ocr 已在 page segmentation method(psm). You just need to use a better python wrapper, which exposes more of tesseract's functionalities than pytesseract does. One of the better ones is tesserocr 中提供此功能。

一个简单的图片示例：

  import cv2
  import numpy as np
  import tesserocr as tr
  from PIL import Image

  cv_img = cv2.imread('text.png', cv2.IMREAD_UNCHANGED)

  # since tesserocr accepts PIL images, converting opencv image to pil
  pil_img = Image.fromarray(cv2.cvtColor(cv_img,cv2.COLOR_BGR2RGB))

  #initialize api
  api = tr.PyTessBaseAPI()
  try:
    # set pil image for ocr
    api.SetImage(pil_img)
    # Google tesseract-ocr has a page segmentation methos(psm) option for specifying ocr types
    # psm values can be: block of text, single text line, single word, single character etc.
    # api.GetComponentImages method exposes this functionality
    # function returns:
    # image (:class:`PIL.Image`): Image object.
    # bounding box (dict): dict with x, y, w, h keys.
    # block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
    # paragraph id (int): textline paragraph id within its block (if paraids is True).
    # ``None`` otherwise.
    boxes = api.GetComponentImages(tr.RIL.TEXTLINE,True)
    # get text
    text = api.GetUTF8Text()
    # iterate over returned list, draw rectangles
    for (im,box,_,_) in boxes:
      x,y,w,h = box['x'],box['y'],box['w'],box['h']
      cv2.rectangle(cv_img, (x,y), (x+w,y+h), color=(0,0,255))
  finally:
    api.End()

  cv2.imshow('output', cv_img)
  cv2.waitKey(0)
  cv2.destroyAllWindows()

Answer 2

我来晚了，正在寻找其他东西。我从来没有使用过 tesser 包装器，它们似乎只是妨碍了我，并没有真正的好处。他们所做的只是抽象出对子进程的调用？

这就是我通过传递给子进程的参数访问 psm 配置的方式。为了完整起见，我也包含了 oem、pdf 和 hocr 参数，但这不是必需的，您可以只传递 psm 参数。请务必在终端拨打帮助电话，因为有 13 个 psm 选项和 4 个用于 oem。根据您的工作，质量可能高度依赖于 psm。

可以使用 subprocess.Popen() 进行管道输入和输出，或者如果您喜欢冒险，您可以使用 asyncio.create_subprocess_exec() 异步进行，方法大致相同。

import subprocess

# args 
# 'tesseract' - the executable name
# path to the image file
# output file name - no extension tesser will add .txt .pdf .hocr etc etc
# optional params
# -psm x to set the page segmentation mode see more with tesseract --help-psm at the cli
# -oem x to set ocr engine mode see more with tesseract --help-osm
# can add a mode parameter to the end of the args list to get output in :
# searchable pdf - just add a parameter 'pdf' as below
# hOCR output (html) - just add 'hocr' as below

args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2']

# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'pdf']
# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'hocr']

try:
    proc = subprocess.check_call(args)
    print('subprocess retcode {r}'.format(r=proc))
except subprocess.CalledProcessError as exp:
    print('subprocess.CalledProcessError : ', exp)

聚类边界框并在其上画线（OpenCV，Python）

Cluster bounding boxes and draw line on them (OpenCV, Python)

python

opencv

bounding-box