为什么 Tesseract 数字识别不能正常工作?
Why is Tesseract number recognition not working properly?
过去几天我一直在使用 pytesseract
,我注意到图书馆在识别数字方面非常糟糕。我不知道,如果我做错了什么,但我一直得到 ♀
作为输出。
class Image_Recognition():
def digit_identification(self):
# save normal screenshot
screen = ImageGrab.grab(bbox=(706,226,1200,726))
screen.save(r'tmp\tmp.png')
# read the image file
img = cv2.imread(r'tmp\tmp.png', 2)
# convert to binary image
[ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
# use OCR library to identify numbers in screenshot
text = pytesseract.image_to_string(bw_img)
print(text)
输入:
(转换为二值图像,以便使数字更易于理解。)
输出:
♀
如果有什么不妥请告诉我,或者只是建议处理文本识别的其他方法。
首先,请阅读文章Improving the quality of the output, especially the section regarding the page segmentation method. Also, you can limit the characters to be found to digits 0-9
。
您的图像很小,这使得一次提取所有数字变得非常具有挑战性,尤其是对于深色背景上明亮文本的混合,反之亦然。但是,您可以很容易地裁剪所有单个图块,并一个一个地提取数字。因此,无需区分这两种瓷砖。
此外,您知道,数字必须是二的倍数(我想,大多数人都知道 2048)。因此,如果找不到这样的数字,请尝试放大裁剪的图块,然后重复。 (最终,几次之后就放弃了。)
这就是我的完整代码:
import cv2
import math
import pytesseract
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
return math.log10(x) / math.log10(2)
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
return math.ceil(log2(n)) == math.floor(log2(n))
# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]
# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)
# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
#
config = '--psm 6 -c tessedit_char_whitelist=0123456789'
# Iterate tiles, and extract texts
for i in range(4):
for j in range(4):
# Crop tile
x1 = i * w
x2 = (i + 1) * w
y1 = j * h
y2 = (j + 1) * h
roi = img[y1:y2, x1:x2]
# If no proper power of 2 is found, upscale image and repeat
while True:
text = pytesseract.image_to_string(roi, config=config)
text = text.replace('\n', '').replace('\f', '')
if (text == '') or (not is_power_of_2(int(text))):
roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
if roi.shape[0] > 1000:
a[j, i] = -1
break
else:
a[j, i] = int(text)
break
print(a)
对于给定的图像,我得到以下输出:
[[ 8 16 4 2]
[ 2 8 32 8]
[ 2 4 16 4]
[ 4 2 4 2]]
再找一张相似的图片
我得到:
[[ 4 -1 -1 -1]
[ 2 2 -1 -1]
[-1 -1 -1 -1]
[ 2 -1 -1 -1]]
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.3
OpenCV: 4.5.3
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
过去几天我一直在使用 pytesseract
,我注意到图书馆在识别数字方面非常糟糕。我不知道,如果我做错了什么,但我一直得到 ♀
作为输出。
class Image_Recognition():
def digit_identification(self):
# save normal screenshot
screen = ImageGrab.grab(bbox=(706,226,1200,726))
screen.save(r'tmp\tmp.png')
# read the image file
img = cv2.imread(r'tmp\tmp.png', 2)
# convert to binary image
[ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
# use OCR library to identify numbers in screenshot
text = pytesseract.image_to_string(bw_img)
print(text)
输入:
(转换为二值图像,以便使数字更易于理解。)
输出:
♀
如果有什么不妥请告诉我,或者只是建议处理文本识别的其他方法。
首先,请阅读文章Improving the quality of the output, especially the section regarding the page segmentation method. Also, you can limit the characters to be found to digits 0-9
。
您的图像很小,这使得一次提取所有数字变得非常具有挑战性,尤其是对于深色背景上明亮文本的混合,反之亦然。但是,您可以很容易地裁剪所有单个图块,并一个一个地提取数字。因此,无需区分这两种瓷砖。
此外,您知道,数字必须是二的倍数(我想,大多数人都知道 2048)。因此,如果找不到这样的数字,请尝试放大裁剪的图块,然后重复。 (最终,几次之后就放弃了。)
这就是我的完整代码:
import cv2
import math
import pytesseract
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
return math.log10(x) / math.log10(2)
# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
return math.ceil(log2(n)) == math.floor(log2(n))
# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]
# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)
# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
#
config = '--psm 6 -c tessedit_char_whitelist=0123456789'
# Iterate tiles, and extract texts
for i in range(4):
for j in range(4):
# Crop tile
x1 = i * w
x2 = (i + 1) * w
y1 = j * h
y2 = (j + 1) * h
roi = img[y1:y2, x1:x2]
# If no proper power of 2 is found, upscale image and repeat
while True:
text = pytesseract.image_to_string(roi, config=config)
text = text.replace('\n', '').replace('\f', '')
if (text == '') or (not is_power_of_2(int(text))):
roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
if roi.shape[0] > 1000:
a[j, i] = -1
break
else:
a[j, i] = int(text)
break
print(a)
对于给定的图像,我得到以下输出:
[[ 8 16 4 2]
[ 2 8 32 8]
[ 2 4 16 4]
[ 4 2 4 2]]
再找一张相似的图片
我得到:
[[ 4 -1 -1 -1]
[ 2 2 -1 -1]
[-1 -1 -1 -1]
[ 2 -1 -1 -1]]
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.3
OpenCV: 4.5.3
pytesseract: 5.0.0-alpha.20201127
----------------------------------------