PyTesseract 适用于此代码,但不适用于我的代码(差异极小)
PyTesseract works great with this code, but not my code (with minimal differences)
我正在尝试使用 this tutorial 在我的桌面上安装 PyTesseract OCR。它在我 运行 那个脚本时起作用,正如你在这张图片中看到的那样:
,
教程中的代码:
#Construct arg parser and parse arg's
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd")
# '--image' refers to the path of the input image that will be OCR'd
ap.add_argument("-c", "--min-conf", type=int, default=0, help="min conf value to filter weak text detection")
# sets a min conf to filter weak detections
args = vars(ap.parse_args())
#Load input image, convert from BGR to RGB ch ordering, and
# use Tesseract to localize each area of text in the input image
image = cv2.imread(args["image"] )
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
# 'image_to_data' detects and localizes text
#Loop over each indiv text localizations
for i in range(0, len(results["text"] ) ):
#extract bounding box coordinates of the text region from the current result
x = results["left"][i]
y = results["top"][i]
w = results["width"][i]
h = results["height"][i]
#extract OCR itself along with conf of text localztn
text = results["text"][i]
print(results["conf"][i])
conf = int( results["conf"][i] )
#Filter out weak conf text localztns
if conf > args["min_conf"]:
#display conf and text to terminal
print("Confidence: {}".format(conf) )
print("Text: {}".format(text) )
print("")
#remove non-ASCII text so we can draw text on image using OpenCV, then draw bounding box around text with text itself
text = "".join( [c if ord(c) < 128 else "" for c in text] ).strip()
cv2.rectangle(image, (x,y), (x+w, y+h), (0, 255, 0), 2 )
cv2.putText(image, text, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
#Show output image
cv2.imshow("Image", image)
cv2.waitKey(0) # makes it so that it'll wait for you to hit a key before it continues
但是当我尝试将它实现到另一个项目时它不起作用。这是 我的 代码:
screenshotOfDesktop = pyautogui.screenshot('screenshotOfDesktop.png')
#have Tesseract read it
readDesktop_SAP = cv2.imread('screenshotOfDesktop.png')
#convert data to string
rgb = cv2.cvtColor(readDesktop_SAP, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, config='--psm 7', output_type=Output.DICT)
# "config= '--psm 7' " makes it so that PyTesseract reads everything as a single line of text
print(results)
# Iterating through the list of results
for i in range(0, len(results["text"] ) ):
if "Description" not in results["text"]:
print("Didn't find description on screen. Please check that the SAP 'find document' page is open on the screen. ")
input('Press ENTER to exit now. ')
exit()
if "Description" in results["text"]:
print("Found 'Description' on screen! ")
# Gating by confidence
conf = int(results["conf"][i])
if conf < 0.2:
print("Confidence is less than 0.7. Moving on. ")
continue
elif conf >= 0.2:
# Getting the coordinates of the result
Desc_x = results["left"][i]
Desc_y = results["top"][i]
Desc_w = results["width"][i]
Desc_h = results["height"][i]
# Printing everything
print("The coordinates are: ")
print(x, y, width, height)
print(f"Confidence = {conf}")
#
相反,我的代码只为“结果”列表吐出这个:
{'level': [1, 2, 3, 4, 5, 5], 'page_num': [1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2], 'left': [0, 0, 0, 0, 0, 1451], 'top': [0, 4, 4, 4, 4, 145], 'width': [1920, 1727, 1912, 1727, 891, 276], 'height': [1080, 1061, 1070, 1061, 1061, 8], 'conf': ['-1', '-1', '-1', '-1', 11, 0], 'text': ['', '', '', '', 'fe', '~']}
有人知道为什么会这样吗?我知道我没有像作者那样使用 argparser,但它应该是相同的结果,不是吗?我检查以确保它也在查看正确的屏幕截图。
相关信息:
- Tesseract v4.1.0.20190314
- Python 3.9.2
在使用 PyTesseract 进行 OCR 之前,我没有意识到教程代码使用了灰度图像。我实现了灰度,之后能够找到文本。
我正在尝试使用 this tutorial 在我的桌面上安装 PyTesseract OCR。它在我 运行 那个脚本时起作用,正如你在这张图片中看到的那样:
教程中的代码:
#Construct arg parser and parse arg's
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to input image to be OCR'd")
# '--image' refers to the path of the input image that will be OCR'd
ap.add_argument("-c", "--min-conf", type=int, default=0, help="min conf value to filter weak text detection")
# sets a min conf to filter weak detections
args = vars(ap.parse_args())
#Load input image, convert from BGR to RGB ch ordering, and
# use Tesseract to localize each area of text in the input image
image = cv2.imread(args["image"] )
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
# 'image_to_data' detects and localizes text
#Loop over each indiv text localizations
for i in range(0, len(results["text"] ) ):
#extract bounding box coordinates of the text region from the current result
x = results["left"][i]
y = results["top"][i]
w = results["width"][i]
h = results["height"][i]
#extract OCR itself along with conf of text localztn
text = results["text"][i]
print(results["conf"][i])
conf = int( results["conf"][i] )
#Filter out weak conf text localztns
if conf > args["min_conf"]:
#display conf and text to terminal
print("Confidence: {}".format(conf) )
print("Text: {}".format(text) )
print("")
#remove non-ASCII text so we can draw text on image using OpenCV, then draw bounding box around text with text itself
text = "".join( [c if ord(c) < 128 else "" for c in text] ).strip()
cv2.rectangle(image, (x,y), (x+w, y+h), (0, 255, 0), 2 )
cv2.putText(image, text, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
#Show output image
cv2.imshow("Image", image)
cv2.waitKey(0) # makes it so that it'll wait for you to hit a key before it continues
但是当我尝试将它实现到另一个项目时它不起作用。这是 我的 代码:
screenshotOfDesktop = pyautogui.screenshot('screenshotOfDesktop.png')
#have Tesseract read it
readDesktop_SAP = cv2.imread('screenshotOfDesktop.png')
#convert data to string
rgb = cv2.cvtColor(readDesktop_SAP, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, config='--psm 7', output_type=Output.DICT)
# "config= '--psm 7' " makes it so that PyTesseract reads everything as a single line of text
print(results)
# Iterating through the list of results
for i in range(0, len(results["text"] ) ):
if "Description" not in results["text"]:
print("Didn't find description on screen. Please check that the SAP 'find document' page is open on the screen. ")
input('Press ENTER to exit now. ')
exit()
if "Description" in results["text"]:
print("Found 'Description' on screen! ")
# Gating by confidence
conf = int(results["conf"][i])
if conf < 0.2:
print("Confidence is less than 0.7. Moving on. ")
continue
elif conf >= 0.2:
# Getting the coordinates of the result
Desc_x = results["left"][i]
Desc_y = results["top"][i]
Desc_w = results["width"][i]
Desc_h = results["height"][i]
# Printing everything
print("The coordinates are: ")
print(x, y, width, height)
print(f"Confidence = {conf}")
#
相反,我的代码只为“结果”列表吐出这个:
{'level': [1, 2, 3, 4, 5, 5], 'page_num': [1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2], 'left': [0, 0, 0, 0, 0, 1451], 'top': [0, 4, 4, 4, 4, 145], 'width': [1920, 1727, 1912, 1727, 891, 276], 'height': [1080, 1061, 1070, 1061, 1061, 8], 'conf': ['-1', '-1', '-1', '-1', 11, 0], 'text': ['', '', '', '', 'fe', '~']}
有人知道为什么会这样吗?我知道我没有像作者那样使用 argparser,但它应该是相同的结果,不是吗?我检查以确保它也在查看正确的屏幕截图。
相关信息:
- Tesseract v4.1.0.20190314
- Python 3.9.2
在使用 PyTesseract 进行 OCR 之前,我没有意识到教程代码使用了灰度图像。我实现了灰度,之后能够找到文本。