Opencv OCR 改进了从具有背景的彩色图像中提取数据
Open CV OCR improve data extraction from color image with background
我正在尝试从手机屏幕截图中提取一些信息。虽然我的代码能够检索到一些信息,但不是全部。我阅读了转换为灰色的图像,然后删除了不需要的部分并应用了高斯阈值。但是整个文本都没有被阅读。
import numpy as np
import cv2
from PIL import Image
import matplotlib.pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Installs\Tools\Tesseract-OCR\tesseract.exe'
image = "C:\Workspace\OCR\tesseract\rpstocks1 - Copy (2).png"
img = cv2.imread(image)
img_grey = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
height, width, channels = img.shape
print (height, width, channels)
rec_img=cv2.rectangle(img_grey,(30,100),(1040,704),(0,255,0),3).copy()
crop_img = rec_img[105:1945, 35:1035].copy()
cv2.medianBlur(img,5)
cv2.imwrite("C:\Workspace\OCR\tesseract\Cropped_GREY.jpg",crop_img)
img_gauss = cv2.adaptiveThreshold(crop_img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,12)
cv2.imwrite("C:\Workspace\OCR\tesseract\Cropped_Guass.jpg",img_gauss)
text = pytesseract.image_to_string(img_gauss, lang='eng')
text.encode('utf-8')
print(text)
输出
图像尺寸 704 1080 3
投资
,712.99
ASRT _ 0
500.46 shares ......... .. /0
GNUS
25169 Shares """"" " ‘27.98%
rpstocks1 - 复制 (2).png
Cropped_GREY.jpg
Cropped_Guass.jpg
看看pytesseract
的页面分割模式,cf。 。例如,使用 config='-psm 12'
将已经为您提供所有需要的文本。然而,这些图表也以某种方式被解释为文本。
这就是为什么我会对图像进行预处理以获得单个框(实际文本、图表、来自顶部的信息等),并进行过滤以仅存储那些包含感兴趣内容的框。这可以通过使用
来完成
- 边框的
y
坐标(不在图片的上5%,就是移动phone状态栏),
- 边界矩形的宽度
w
(不超过图像宽度的 50%,这些是水平线),
- 边界矩形的
x
坐标(不在图像的中间三分之一处,这些是图表)。
剩下的就是 运行 pytesseract
在每个裁剪后的图像上 config='-psm 6'
例如(假设一个统一的文本块) , 并清除所有换行符中的文本。
那是我的代码:
import cv2
import pytesseract
# Read image
img = cv2.imread('cUcby.png')
hi, wi = img.shape[:2]
# Convert to grayscale for tesseraact
img_grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Mask single boxes by thresholding and morphological closing in x diretion
mask = cv2.threshold(img_grey, 248, 255, cv2.THRESH_BINARY_INV)[1]
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE,
cv2.getStructuringElement(cv2.MORPH_RECT, (51, 1)))
# Find contours w.r.t. the OpenCV version
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
# Get bounding rectangles
rects = [cv2.boundingRect(cnt) for cnt in cnts]
# Filter bounding rectangles:
# - not in the upper 5 % of the image (mobile phone status bar)
# - not wider than 50 % of the image' width (horizontal lines)
# - not being in the middle third of the image (graphs)
rects = [(x, y, w, h) for x, y, w, h in rects if
(y > 0.05 * hi) and
(w <= 0.5 * wi) and
((x < 0.3333 * wi) or (x > 0.6666 * wi))]
# Sort bounding rectangles first by y coordinate, then by x coordinate
rects = sorted(rects, key=lambda x: (x[1], x[0]))
# Get texts from bounding rectangles from pytesseract
texts = [pytesseract.image_to_string(
img_grey[y-1:y+h+1, x-1:x+w+1], config='-psm 6') for x, y, w, h in rects]
# Remove line breaks
texts = [text.replace('\n', '') for text in texts]
# Output
print(texts)
这就是输出:
['Investing', ',712.99', 'ASRT', '-27.64%', '500.46 shares', 'GNUS', '-27.98%', '251.69 shares']
由于您知道边界矩形的位置,您还可以使用该信息重新排列整个文本。
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1.1
OpenCV: 4.5.1
pytesseract: 4.00.00alpha
----------------------------------------
我正在尝试从手机屏幕截图中提取一些信息。虽然我的代码能够检索到一些信息,但不是全部。我阅读了转换为灰色的图像,然后删除了不需要的部分并应用了高斯阈值。但是整个文本都没有被阅读。
import numpy as np
import cv2
from PIL import Image
import matplotlib.pyplot as plt
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Installs\Tools\Tesseract-OCR\tesseract.exe'
image = "C:\Workspace\OCR\tesseract\rpstocks1 - Copy (2).png"
img = cv2.imread(image)
img_grey = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
height, width, channels = img.shape
print (height, width, channels)
rec_img=cv2.rectangle(img_grey,(30,100),(1040,704),(0,255,0),3).copy()
crop_img = rec_img[105:1945, 35:1035].copy()
cv2.medianBlur(img,5)
cv2.imwrite("C:\Workspace\OCR\tesseract\Cropped_GREY.jpg",crop_img)
img_gauss = cv2.adaptiveThreshold(crop_img,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,11,12)
cv2.imwrite("C:\Workspace\OCR\tesseract\Cropped_Guass.jpg",img_gauss)
text = pytesseract.image_to_string(img_gauss, lang='eng')
text.encode('utf-8')
print(text)
输出
图像尺寸 704 1080 3
投资
,712.99
ASRT _ 0
500.46 shares ......... .. /0
GNUS
25169 Shares """"" " ‘27.98%
rpstocks1 - 复制 (2).png
看看pytesseract
的页面分割模式,cf。 config='-psm 12'
将已经为您提供所有需要的文本。然而,这些图表也以某种方式被解释为文本。
这就是为什么我会对图像进行预处理以获得单个框(实际文本、图表、来自顶部的信息等),并进行过滤以仅存储那些包含感兴趣内容的框。这可以通过使用
来完成- 边框的
y
坐标(不在图片的上5%,就是移动phone状态栏), - 边界矩形的宽度
w
(不超过图像宽度的 50%,这些是水平线), - 边界矩形的
x
坐标(不在图像的中间三分之一处,这些是图表)。
剩下的就是 运行 pytesseract
在每个裁剪后的图像上 config='-psm 6'
例如(假设一个统一的文本块) , 并清除所有换行符中的文本。
那是我的代码:
import cv2
import pytesseract
# Read image
img = cv2.imread('cUcby.png')
hi, wi = img.shape[:2]
# Convert to grayscale for tesseraact
img_grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Mask single boxes by thresholding and morphological closing in x diretion
mask = cv2.threshold(img_grey, 248, 255, cv2.THRESH_BINARY_INV)[1]
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE,
cv2.getStructuringElement(cv2.MORPH_RECT, (51, 1)))
# Find contours w.r.t. the OpenCV version
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
# Get bounding rectangles
rects = [cv2.boundingRect(cnt) for cnt in cnts]
# Filter bounding rectangles:
# - not in the upper 5 % of the image (mobile phone status bar)
# - not wider than 50 % of the image' width (horizontal lines)
# - not being in the middle third of the image (graphs)
rects = [(x, y, w, h) for x, y, w, h in rects if
(y > 0.05 * hi) and
(w <= 0.5 * wi) and
((x < 0.3333 * wi) or (x > 0.6666 * wi))]
# Sort bounding rectangles first by y coordinate, then by x coordinate
rects = sorted(rects, key=lambda x: (x[1], x[0]))
# Get texts from bounding rectangles from pytesseract
texts = [pytesseract.image_to_string(
img_grey[y-1:y+h+1, x-1:x+w+1], config='-psm 6') for x, y, w, h in rects]
# Remove line breaks
texts = [text.replace('\n', '') for text in texts]
# Output
print(texts)
这就是输出:
['Investing', ',712.99', 'ASRT', '-27.64%', '500.46 shares', 'GNUS', '-27.98%', '251.69 shares']
由于您知道边界矩形的位置,您还可以使用该信息重新排列整个文本。
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1.1
OpenCV: 4.5.1
pytesseract: 4.00.00alpha
----------------------------------------