聚类边界框并在其上画线(OpenCV,Python)
Cluster bounding boxes and draw line on them (OpenCV, Python)
使用这段代码,我在下图中的字符周围创建了一些边界框:
import csv
import cv2
from pytesseract import pytesseract as pt
pt.run_tesseract('bb.png', 'output', lang=None, boxes=True, config="hocr")
# To read the coordinates
boxes = []
with open('output.box', 'rt') as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
if len(row) == 6:
boxes.append(row)
# Draw the bounding box
img = cv2.imread('bb.png')
h, w, _ = img.shape
for b in boxes:
img = cv2.rectangle(img, (int(b[1]), h-int(b[2])), (int(b[3]), h-int(b[4])), (0, 255, 0), 2)
cv2.imshow('output', img)
cv2.waitKey(0)
输出
我想要的是:
程序要在bounding box的X轴上画一条垂线(只针对第一个和第三个text-area。中间的一定不要对这个过程感兴趣)
目标是这样的(还有另一种实现方式,请解释):一旦我有了这两条线(或者,更好的是,一组坐标),使用遮罩来覆盖这两个区域。
可能吗?
源图片:
CSV 请求:
打印(盒子)
[['l', '56', '328', '63', '365', '0'], ['i', '69', '328', '76', '365', '0'], ['n', '81', '328', '104', '354', '0'], ['e', '108', '328', '130', '354', '0'], ['1', '147', '328', '161', '362', '0'], ['m', '102', '193', '151', '227', '0'], ['i', '158', '193', '167', '242', '0'], ['d', '173', '192', '204', '242', '0'], ['d', '209', '192', '240', '242', '0'], ['l', '247', '193', '256', '242', '0'], ['e', '262', '192', '292', '227', '0'], ['t', '310', '192', '331', '235', '0'], ['e', '334', '192', '364', '227', '0'], ['x', '367', '193', '398', '227', '0'], ['t', '399', '192', '420', '235', '0'], ['-', '440', '209', '458', '216', '0'], ['n', '481', '193', '511', '227', '0'], ['o', '516', '192', '548', '227', '0'], ['n', '553', '193', '583', '227', '0'], ['t', '602', '192', '623', '235', '0'], ['o', '626', '192', '658', '227', '0'], ['t', '676', '192', '697', '235', '0'], ['o', '700', '192', '732', '227', '0'], ['u', '737', '192', '767', '227', '0'], ['c', '772', '192', '802', '227', '0'], ['h', '806', '193', '836', '242', '0'], ['l', '597', '49', '604', '86', '0'], ['i', '610', '49', '617', '86', '0'], ['n', '622', '49', '645', '75', '0'], ['e', '649', '49', '671', '75', '0'], ['2', '686', '49', '710', '83', '0']]
编辑:
要使用 zindarod
答案,您需要 tesserocr。通过 pip install tesserocr
安装可能会给您带来各种错误。
我找到了它的 wheel 版本(在尝试安装和解决错误的几个小时后,请参阅我在答案下方的评论...):here you can find/download it.
希望这对您有所帮助..
Google 的 tesseract-ocr 已在 page segmentation method(psm). You just need to use a better python wrapper, which exposes more of tesseract's functionalities than pytesseract does. One of the better ones is tesserocr 中提供此功能。
一个简单的图片示例:
import cv2
import numpy as np
import tesserocr as tr
from PIL import Image
cv_img = cv2.imread('text.png', cv2.IMREAD_UNCHANGED)
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv2.cvtColor(cv_img,cv2.COLOR_BGR2RGB))
#initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation methos(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.TEXTLINE,True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im,box,_,_) in boxes:
x,y,w,h = box['x'],box['y'],box['w'],box['h']
cv2.rectangle(cv_img, (x,y), (x+w,y+h), color=(0,0,255))
finally:
api.End()
cv2.imshow('output', cv_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
我来晚了,正在寻找其他东西。我从来没有使用过 tesser 包装器,它们似乎只是妨碍了我,并没有真正的好处。他们所做的只是抽象出对子进程的调用?
这就是我通过传递给子进程的参数访问 psm 配置的方式。为了完整起见,我也包含了 oem、pdf 和 hocr 参数,但这不是必需的,您可以只传递 psm 参数。请务必在终端拨打帮助电话,因为有 13 个 psm 选项和 4 个用于 oem。根据您的工作,质量可能高度依赖于 psm。
可以使用 subprocess.Popen() 进行管道输入和输出,或者如果您喜欢冒险,您可以使用 asyncio.create_subprocess_exec() 异步进行,方法大致相同。
import subprocess
# args
# 'tesseract' - the executable name
# path to the image file
# output file name - no extension tesser will add .txt .pdf .hocr etc etc
# optional params
# -psm x to set the page segmentation mode see more with tesseract --help-psm at the cli
# -oem x to set ocr engine mode see more with tesseract --help-osm
# can add a mode parameter to the end of the args list to get output in :
# searchable pdf - just add a parameter 'pdf' as below
# hOCR output (html) - just add 'hocr' as below
args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2']
# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'pdf']
# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'hocr']
try:
proc = subprocess.check_call(args)
print('subprocess retcode {r}'.format(r=proc))
except subprocess.CalledProcessError as exp:
print('subprocess.CalledProcessError : ', exp)
使用这段代码,我在下图中的字符周围创建了一些边界框:
import csv
import cv2
from pytesseract import pytesseract as pt
pt.run_tesseract('bb.png', 'output', lang=None, boxes=True, config="hocr")
# To read the coordinates
boxes = []
with open('output.box', 'rt') as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
if len(row) == 6:
boxes.append(row)
# Draw the bounding box
img = cv2.imread('bb.png')
h, w, _ = img.shape
for b in boxes:
img = cv2.rectangle(img, (int(b[1]), h-int(b[2])), (int(b[3]), h-int(b[4])), (0, 255, 0), 2)
cv2.imshow('output', img)
cv2.waitKey(0)
输出
我想要的是:
程序要在bounding box的X轴上画一条垂线(只针对第一个和第三个text-area。中间的一定不要对这个过程感兴趣)
目标是这样的(还有另一种实现方式,请解释):一旦我有了这两条线(或者,更好的是,一组坐标),使用遮罩来覆盖这两个区域。
可能吗?
源图片:
CSV 请求: 打印(盒子)
[['l', '56', '328', '63', '365', '0'], ['i', '69', '328', '76', '365', '0'], ['n', '81', '328', '104', '354', '0'], ['e', '108', '328', '130', '354', '0'], ['1', '147', '328', '161', '362', '0'], ['m', '102', '193', '151', '227', '0'], ['i', '158', '193', '167', '242', '0'], ['d', '173', '192', '204', '242', '0'], ['d', '209', '192', '240', '242', '0'], ['l', '247', '193', '256', '242', '0'], ['e', '262', '192', '292', '227', '0'], ['t', '310', '192', '331', '235', '0'], ['e', '334', '192', '364', '227', '0'], ['x', '367', '193', '398', '227', '0'], ['t', '399', '192', '420', '235', '0'], ['-', '440', '209', '458', '216', '0'], ['n', '481', '193', '511', '227', '0'], ['o', '516', '192', '548', '227', '0'], ['n', '553', '193', '583', '227', '0'], ['t', '602', '192', '623', '235', '0'], ['o', '626', '192', '658', '227', '0'], ['t', '676', '192', '697', '235', '0'], ['o', '700', '192', '732', '227', '0'], ['u', '737', '192', '767', '227', '0'], ['c', '772', '192', '802', '227', '0'], ['h', '806', '193', '836', '242', '0'], ['l', '597', '49', '604', '86', '0'], ['i', '610', '49', '617', '86', '0'], ['n', '622', '49', '645', '75', '0'], ['e', '649', '49', '671', '75', '0'], ['2', '686', '49', '710', '83', '0']]
编辑:
要使用 zindarod
答案,您需要 tesserocr。通过 pip install tesserocr
安装可能会给您带来各种错误。
我找到了它的 wheel 版本(在尝试安装和解决错误的几个小时后,请参阅我在答案下方的评论...):here you can find/download it.
希望这对您有所帮助..
Google 的 tesseract-ocr 已在 page segmentation method(psm). You just need to use a better python wrapper, which exposes more of tesseract's functionalities than pytesseract does. One of the better ones is tesserocr 中提供此功能。
一个简单的图片示例:
import cv2
import numpy as np
import tesserocr as tr
from PIL import Image
cv_img = cv2.imread('text.png', cv2.IMREAD_UNCHANGED)
# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv2.cvtColor(cv_img,cv2.COLOR_BGR2RGB))
#initialize api
api = tr.PyTessBaseAPI()
try:
# set pil image for ocr
api.SetImage(pil_img)
# Google tesseract-ocr has a page segmentation methos(psm) option for specifying ocr types
# psm values can be: block of text, single text line, single word, single character etc.
# api.GetComponentImages method exposes this functionality
# function returns:
# image (:class:`PIL.Image`): Image object.
# bounding box (dict): dict with x, y, w, h keys.
# block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
# paragraph id (int): textline paragraph id within its block (if paraids is True).
# ``None`` otherwise.
boxes = api.GetComponentImages(tr.RIL.TEXTLINE,True)
# get text
text = api.GetUTF8Text()
# iterate over returned list, draw rectangles
for (im,box,_,_) in boxes:
x,y,w,h = box['x'],box['y'],box['w'],box['h']
cv2.rectangle(cv_img, (x,y), (x+w,y+h), color=(0,0,255))
finally:
api.End()
cv2.imshow('output', cv_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
我来晚了,正在寻找其他东西。我从来没有使用过 tesser 包装器,它们似乎只是妨碍了我,并没有真正的好处。他们所做的只是抽象出对子进程的调用?
这就是我通过传递给子进程的参数访问 psm 配置的方式。为了完整起见,我也包含了 oem、pdf 和 hocr 参数,但这不是必需的,您可以只传递 psm 参数。请务必在终端拨打帮助电话,因为有 13 个 psm 选项和 4 个用于 oem。根据您的工作,质量可能高度依赖于 psm。
可以使用 subprocess.Popen() 进行管道输入和输出,或者如果您喜欢冒险,您可以使用 asyncio.create_subprocess_exec() 异步进行,方法大致相同。
import subprocess
# args
# 'tesseract' - the executable name
# path to the image file
# output file name - no extension tesser will add .txt .pdf .hocr etc etc
# optional params
# -psm x to set the page segmentation mode see more with tesseract --help-psm at the cli
# -oem x to set ocr engine mode see more with tesseract --help-osm
# can add a mode parameter to the end of the args list to get output in :
# searchable pdf - just add a parameter 'pdf' as below
# hOCR output (html) - just add 'hocr' as below
args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2']
# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'pdf']
# args = ['tesseract', 'Im1.tiff', 'Im1', '-psm 1', '-oem 2', 'hocr']
try:
proc = subprocess.check_call(args)
print('subprocess retcode {r}'.format(r=proc))
except subprocess.CalledProcessError as exp:
print('subprocess.CalledProcessError : ', exp)